Download Raw Diff

Details

Reviewers

ab
delena
srhines

Commits

rGaf171e7720ec: [X86] Updates to X86 backend for f16 promotion
rL237004: [X86] Updates to X86 backend for f16 promotion

Summary

r235215 adds support for f16 to be considered as a load/store type and
promote f16 operations to f32.

This patch has miscellaneous fixes for the X86 backend so all f16
operations are handled:

Set loadextaction for f16 vectors to expand.
Handle FP_EXTEND in a switch statement when handling v2f32
Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or

(store (SINT_TO_FP )) to a FILD.

Tests included.

Diff Detail

Repository: rL LLVM

Event Timeline

pirama updated this revision to Diff 23977.Apr 17 2015, 4:48 PM

pirama retitled this revision from to [X86] Updates to X86 backend for f16 promotion.

pirama updated this object.

pirama edited the test plan for this revision. (Show Details)

pirama added reviewers: ab, srhines, delena.

pirama added a subscriber: Unknown Object (MLST).

Minor update to test.

Ping.

ab added inline comments.Apr 28 2015, 11:39 AM

lib/Target/X86/X86ISelLowering.cpp
750–753 ↗	(On Diff #24067)	A longstanding item on my todo list is to properly support F16C, where this is legal, and we don't even need to scalarize vector conversions. I'm fine with doing that separately though, that might be tricky.
17317–17321 ↗	(On Diff #24067)	Why isn't this in the FP_TO_UINT block? Either way, please add uitofp/fptoui tests.
17367–17373 ↗	(On Diff #24067)	Does v2f32 even hit this? Without this patch it should trigger the default unreachable, why would that change?

pirama added inline comments.Apr 28 2015, 7:39 PM

lib/Target/X86/X86ISelLowering.cpp
17317–17321 ↗	(On Diff #24067)	This block is needed only for FP_TO_SINT with i64 return. OperationAction is set to custom around line 217: setOperationAction(ISD::FP_TO_SINT , MVT::i64 , Custom); setOperationAction(ISD::SINT_TO_FP , MVT::i64 , Custom); I'll add a test for uitofp and fptoui.
17367–17373 ↗	(On Diff #24067)	Around line 924 in this file: setOperationAction(ISD::FP_EXTEND, MVT::v2f32, Custom); is what necessitates this. v2f32 will be a valuetype of an FP_EXTEND node only when the source type is smaller (i.e. v2f16). I am not sure why the OperationAction is set as above, but it triggers the unreachable error for FP_EXTEND from v2f16. The included test for vec4 extend fails because vec4 is split during legalization. Now that I think about it, I should add a vec2 test so this test doesn't rot if vec4 is legal in the future.

delena added inline comments.Apr 30 2015, 1:36 AM

test/CodeGen/X86/half.ll
89 ↗	(On Diff #24067)	This instruction converts and stores the value. Could you, please, show the store in CHECK, somthing like (%rdi)
157 ↗	(On Diff #24067)	Why double to half can't go through the chain (double -> float ->half). I'm just asking..

Added UINT tests
Added checks for STOREs

pirama added inline comments.Apr 30 2015, 12:21 PM

test/CodeGen/X86/half.ll
103 ↗	(On Diff #24764)	The uitofp generates two cvttss2si instructions along with a bunch of other instructions. I don't really understand what's going. Here's the exact sequence for '-f16c': .cfi_def_cfa_offset 16 movzwl (%rdi), %edi callq __gnu_h2f_ieee movss .LCPI9_0(%rip), %xmm1 # xmm1 = mem[0],zero,zero,zero movaps %xmm0, %xmm2 subss %xmm1, %xmm2 cvttss2si %xmm2, %rax movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000 xorq %rax, %rcx cvttss2si %xmm0, %rax ucomiss %xmm1, %xmm0 cmovaeq %rcx, %rax popq %rdx retq If you think this is optimal, I'll add CHECKs for the movabs, xor etc.
157 ↗	(On Diff #24067)	A chained conversion through float is done if unsafe math is enabled (See around line 3458 in lib/CodeGen/SelectionDAG/LegalizeDAG.cpp). Interestingly, half -> double goes through an intermediate float (this is done in the 'case ISD::FP16_TO_FP' just above. I am not sure why double -> float -> half is considered a precision loss, while the reverse chain isn't.

ab added inline comments.Apr 30 2015, 1:37 PM

test/CodeGen/X86/half.ll
157 ↗	(On Diff #24067)	The problem is double rounding: the first round can expose a tie to be broken by the second round, which wouldn't happen with single-step rounding. For instance, consider "1.49" with round-to-even. Rounded directly to 1 digit gets you 1. Rounding first to 2 digits gets you 1.5, and if you round that again to 1 digit, you get 2, a different result. The reverse is fine, since you're basically adding trailing zero bits to the mantissa. For 1.49, that would be akin to turning 1.49 into 1.49000. You can first do 1.490, then 1.49000: the result is the same.

ab added inline comments.Apr 30 2015, 3:15 PM

lib/Target/X86/X86ISelLowering.cpp
17367–17373 ↗	(On Diff #24067)	How about removing that line, and removing the FP_EXTEND block as well? Both seem pretty pointless IMO.
test/CodeGen/X86/half.ll
103 ↗	(On Diff #24764)	Looks like when we don't have a native FP_TO_UINT, we try to expand it using FP_TO_SINT. I understand it as, the intuition being, when: i64 fptosi (f32 a) fits in 63 bits, we can just return that. If it doesn't, we can assume it has to be in [2^63; 2^64[ (otherwise the result is undefined), and we can first do: a -= (f32)(2 << 63) equivalent in spirit to something like "(a & ~(2 << 63))", if a were an i64. Then you can just to: (or (i64 fptosi (f32 a)), (2 << 63)) So, the code looks good. The fact that we use cvttss2si can be confusing, so it's probably best to match more of that logic.

pirama added inline comments.Apr 30 2015, 5:42 PM

lib/Target/X86/X86ISelLowering.cpp
17367–17373 ↗	(On Diff #24067)	Removing the operation action causes test/CodeGen/X86/pr11334.ll to fail. For this test, DAGTypeLegalizer::WidenVectorOperand calls CustomLowerNode based on the operand's value type which eventually causes the v2f32 to be expanded to a v4f32 and use a VFPEXT (in LowerFP_EXTEND). I understood that operation action is always defined for the value type of a node and not on its operand's types. But, looks like that's not the case.
test/CodeGen/X86/half.ll
103 ↗	(On Diff #24764)	Aah, that makes sense. Thanks for the explanation!

Stricter matching for test_fptoui_i64

Looking at this again, the tests should be stricter, I think. Operands, perhaps -NEXT (on X86, you shouldn't have the scheduling/allocation problems you had last time; ARM is .. special), at the very least for the scalar tests.

With that, LGTM, thanks!

lib/Target/X86/X86ISelLowering.cpp
17517 ↗	(On Diff #24818)	Just "fallthrough" ?
17367–17373 ↗	(On Diff #24067)	Ah, that's nasty. (yes, the operation action is keyed off of different types depending on who's asking..) Your change is fine then, I don't have any better idea right now.
test/CodeGen/X86/half.ll
119 ↗	(On Diff #24818)	'g' -> 'q'

This revision is now accepted and ready to land.May 7 2015, 3:28 PM

Make scalar tests stricter
Use the correct tag, CHECK-F16C, in the tests (both old and new) instead of CHECK-FP16. (Nasty, yes)

Ahmed, Elena, I have made the scalar tests stricter. This update was slightly non-trivial. I'll wait for another review instead of pushing based on the prior LGTM.

Stricter tests indeed, go ahead. Thanks!

-Ahmed

test/CodeGen/X86/half.ll
1–2 ↗	(On Diff #25362)	You might want to wrap both of these lines, they're starting to get pretty unwieldy!

Split 'RUN:' lines in test

Closed by commit rL237004: [X86] Updates to X86 backend for f16 promotion (authored by pirama). · Explain WhyMay 11 2015, 10:18 AM

This revision was automatically updated to reflect the committed changes.

Diff 25482

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 744 Lines • ▼ Show 20 Lines	for (MVT InnerVT : MVT::vector_valuetypes()) {
setLoadExtAction(ISD::ZEXTLOAD, InnerVT, VT, Expand);		setLoadExtAction(ISD::ZEXTLOAD, InnerVT, VT, Expand);

// N.b. ISD::EXTLOAD legality is basically ignored except for i1-like		// N.b. ISD::EXTLOAD legality is basically ignored except for i1-like
// types, we have to deal with them whether we ask for Expansion or not.		// types, we have to deal with them whether we ask for Expansion or not.
// Setting Expand causes its own optimisation problems though, so leave		// Setting Expand causes its own optimisation problems though, so leave
// them legal.		// them legal.
if (VT.getVectorElementType() == MVT::i1)		if (VT.getVectorElementType() == MVT::i1)
setLoadExtAction(ISD::EXTLOAD, InnerVT, VT, Expand);		setLoadExtAction(ISD::EXTLOAD, InnerVT, VT, Expand);

		// EXTLOAD for MVT::f16 vectors is not legal because f16 vectors are
		// split/scalarized right now.
		if (VT.getVectorElementType() == MVT::f16)
		setLoadExtAction(ISD::EXTLOAD, InnerVT, VT, Expand);
}		}
}		}

// FIXME: In order to prevent SSE instructions being expanded to MMX ones		// FIXME: In order to prevent SSE instructions being expanded to MMX ones
// with -msoft-float, disable use of MMX as well.		// with -msoft-float, disable use of MMX as well.
if (!TM.Options.UseSoftFloat && Subtarget->hasMMX()) {		if (!TM.Options.UseSoftFloat && Subtarget->hasMMX()) {
addRegisterClass(MVT::x86mmx, &X86::VR64RegClass);		addRegisterClass(MVT::x86mmx, &X86::VR64RegClass);
// No operations on x86mmx supported, everything uses intrinsics.		// No operations on x86mmx supported, everything uses intrinsics.
▲ Show 20 Lines • Show All 16,857 Lines • ▼ Show 20 Lines	void X86TargetLowering::ReplaceNodeResults(SDNode *N,
case ISD::UREM:		case ISD::UREM:
case ISD::SDIVREM:		case ISD::SDIVREM:
case ISD::UDIVREM: {		case ISD::UDIVREM: {
SDValue V = LowerWin64_i128OP(SDValue(N,0), DAG);		SDValue V = LowerWin64_i128OP(SDValue(N,0), DAG);
Results.push_back(V);		Results.push_back(V);
return;		return;
}		}
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
		// FP_TO_INT*_IN_MEM is not legal for f16 inputs. Do not convert
		// (FP_TO_SINT (load f16)) to FP_TO_INT*.
		if (N->getOperand(0).getValueType() == MVT::f16)
		break;
		// fallthrough
case ISD::FP_TO_UINT: {		case ISD::FP_TO_UINT: {
bool IsSigned = N->getOpcode() == ISD::FP_TO_SINT;		bool IsSigned = N->getOpcode() == ISD::FP_TO_SINT;

if (!IsSigned && !isIntegerTypeFTOL(SDValue(N, 0).getValueType()))		if (!IsSigned && !isIntegerTypeFTOL(SDValue(N, 0).getValueType()))
return;		return;

std::pair<SDValue,SDValue> Vals =		std::pair<SDValue,SDValue> Vals =
FP_TO_INTHelper(SDValue(N, 0), DAG, IsSigned, /IsReplace=/ true);		FP_TO_INTHelper(SDValue(N, 0), DAG, IsSigned, /IsReplace=/ true);
Show All 29 Lines	void X86TargetLowering::ReplaceNodeResults(SDNode *N,
}		}
case ISD::FP_ROUND: {		case ISD::FP_ROUND: {
if (!TLI.isTypeLegal(N->getOperand(0).getValueType()))		if (!TLI.isTypeLegal(N->getOperand(0).getValueType()))
return;		return;
SDValue V = DAG.getNode(X86ISD::VFPROUND, dl, MVT::v4f32, N->getOperand(0));		SDValue V = DAG.getNode(X86ISD::VFPROUND, dl, MVT::v4f32, N->getOperand(0));
Results.push_back(V);		Results.push_back(V);
return;		return;
}		}
		case ISD::FP_EXTEND: {
		// Right now, only MVT::v2f32 has OperationAction for FP_EXTEND.
		// No other ValueType for FP_EXTEND should reach this point.
		assert(N->getValueType(0) == MVT::v2f32 &&
		"Do not know how to legalize this Node");
		return;
		}
case ISD::INTRINSIC_W_CHAIN: {		case ISD::INTRINSIC_W_CHAIN: {
unsigned IntNo = cast<ConstantSDNode>(N->getOperand(1))->getZExtValue();		unsigned IntNo = cast<ConstantSDNode>(N->getOperand(1))->getZExtValue();
switch (IntNo) {		switch (IntNo) {
default : llvm_unreachable("Do not know how to custom type "		default : llvm_unreachable("Do not know how to custom type "
"legalize this intrinsic operation!");		"legalize this intrinsic operation!");
case Intrinsic::x86_rdtsc:		case Intrinsic::x86_rdtsc:
return getReadTimeStampCounter(N, dl, X86ISD::RDTSC_DAG, DAG, Subtarget,		return getReadTimeStampCounter(N, dl, X86ISD::RDTSC_DAG, DAG, Subtarget,
Results);		Results);
▲ Show 20 Lines • Show All 6,337 Lines • ▼ Show 20 Lines	if (InVT == MVT::v8i8 \|\| InVT == MVT::v4i8) {
return DAG.getNode(ISD::SINT_TO_FP, dl, N->getValueType(0), P);		return DAG.getNode(ISD::SINT_TO_FP, dl, N->getValueType(0), P);
}		}

// Transform (SINT_TO_FP (i64 ...)) into an x87 operation if we have		// Transform (SINT_TO_FP (i64 ...)) into an x87 operation if we have
// a 32-bit target where SSE doesn't support i64->FP operations.		// a 32-bit target where SSE doesn't support i64->FP operations.
if (Op0.getOpcode() == ISD::LOAD) {		if (Op0.getOpcode() == ISD::LOAD) {
LoadSDNode *Ld = cast<LoadSDNode>(Op0.getNode());		LoadSDNode *Ld = cast<LoadSDNode>(Op0.getNode());
EVT VT = Ld->getValueType(0);		EVT VT = Ld->getValueType(0);

		// This transformation is not supported if the result type is f16
		if (N->getValueType(0) == MVT::f16)
		return SDValue();

if (!Ld->isVolatile() && !N->getValueType(0).isVector() &&		if (!Ld->isVolatile() && !N->getValueType(0).isVector() &&
ISD::isNON_EXTLoad(Op0.getNode()) && Op0.hasOneUse() &&		ISD::isNON_EXTLoad(Op0.getNode()) && Op0.hasOneUse() &&
!Subtarget->is64Bit() && VT == MVT::i64) {		!Subtarget->is64Bit() && VT == MVT::i64) {
SDValue FILDChain = Subtarget->getTargetLowering()->BuildFILD(		SDValue FILDChain = Subtarget->getTargetLowering()->BuildFILD(
SDValue(N, 0), Ld->getValueType(0), Ld->getChain(), Op0, DAG);		SDValue(N, 0), Ld->getValueType(0), Ld->getChain(), Op0, DAG);
DAG.ReplaceAllUsesOfValueWith(Op0.getValue(1), FILDChain.getValue(1));		DAG.ReplaceAllUsesOfValueWith(Op0.getValue(1), FILDChain.getValue(1));
return FILDChain;		return FILDChain;
}		}
▲ Show 20 Lines • Show All 1,075 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/half.ll

; RUN: llc < %s -march=x86-64 -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 -mattr=-f16c \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-LIBCALL		; RUN: llc < %s -march=x86-64 -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 -mattr=-f16c -asm-verbose=false \
; RUN: llc < %s -march=x86-64 -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 -mattr=+f16c \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-F16C		; RUN: \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-LIBCALL
		; RUN: llc < %s -march=x86-64 -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 -mattr=+f16c -asm-verbose=false \
		; RUN: \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-F16C

define void @test_load_store(half* %in, half* %out) {		define void @test_load_store(half* %in, half* %out) {
; CHECK-LABEL: test_load_store:		; CHECK-LABEL: test_load_store:
; CHECK: movw (%rdi), [[TMP:%[a-z0-9]+]]		; CHECK: movw (%rdi), [[TMP:%[a-z0-9]+]]
; CHECK: movw [[TMP]], (%rsi)		; CHECK: movw [[TMP]], (%rsi)
%val = load half, half* %in		%val = load half, half* %in
store half %val, half* %out		store half %val, half* %out
ret void		ret void
Show All 14 Lines	; CHECK: movw %si, (%rdi)
store half %val_fp, half* %addr		store half %val_fp, half* %addr
ret void		ret void
}		}

define float @test_extend32(half* %addr) {		define float @test_extend32(half* %addr) {
; CHECK-LABEL: test_extend32:		; CHECK-LABEL: test_extend32:

; CHECK-LIBCALL: jmp __gnu_h2f_ieee		; CHECK-LIBCALL: jmp __gnu_h2f_ieee
; CHECK-FP16: vcvtph2ps		; CHECK-F16C: vcvtph2ps
%val16 = load half, half* %addr		%val16 = load half, half* %addr
%val32 = fpext half %val16 to float		%val32 = fpext half %val16 to float
ret float %val32		ret float %val32
}		}

define double @test_extend64(half* %addr) {		define double @test_extend64(half* %addr) {
; CHECK-LABEL: test_extend64:		; CHECK-LABEL: test_extend64:

; CHECK-LIBCALL: callq __gnu_h2f_ieee		; CHECK-LIBCALL: callq __gnu_h2f_ieee
; CHECK-LIBCALL: cvtss2sd		; CHECK-LIBCALL: cvtss2sd
; CHECK-FP16: vcvtph2ps		; CHECK-F16C: vcvtph2ps
; CHECK-FP16: vcvtss2sd		; CHECK-F16C: vcvtss2sd
%val16 = load half, half* %addr		%val16 = load half, half* %addr
%val32 = fpext half %val16 to double		%val32 = fpext half %val16 to double
ret double %val32		ret double %val32
}		}

define void @test_trunc32(float %in, half* %addr) {		define void @test_trunc32(float %in, half* %addr) {
; CHECK-LABEL: test_trunc32:		; CHECK-LABEL: test_trunc32:

; CHECK-LIBCALL: callq __gnu_f2h_ieee		; CHECK-LIBCALL: callq __gnu_f2h_ieee
; CHECK-FP16: vcvtps2ph		; CHECK-F16C: vcvtps2ph
%val16 = fptrunc float %in to half		%val16 = fptrunc float %in to half
store half %val16, half* %addr		store half %val16, half* %addr
ret void		ret void
}		}

define void @test_trunc64(double %in, half* %addr) {		define void @test_trunc64(double %in, half* %addr) {
; CHECK-LABEL: test_trunc64:		; CHECK-LABEL: test_trunc64:

; CHECK-LIBCALL: callq __truncdfhf2		; CHECK-LIBCALL: callq __truncdfhf2
; CHECK-FP16: callq __truncdfhf2		; CHECK-F16C: callq __truncdfhf2
%val16 = fptrunc double %in to half		%val16 = fptrunc double %in to half
store half %val16, half* %addr		store half %val16, half* %addr
ret void		ret void
}		}

		define i64 @test_fptosi_i64(half* %p) #0 {
		; CHECK-LABEL: test_fptosi_i64:

		; CHECK-LIBCALL-NEXT: pushq %rax
		; CHECK-LIBCALL-NEXT: movzwl (%rdi), %edi
		; CHECK-LIBCALL-NEXT: callq __gnu_h2f_ieee
		; CHECK-LIBCALL-NEXT: cvttss2si %xmm0, %rax
		; CHECK-LIBCALL-NEXT: popq %rdx
		; CHECK-LIBCALL-NEXT: retq

		; CHECK-F16C-NEXT: movswl (%rdi), [[REG0:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: vmovd [[REG0]], [[REG1:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: vcvtph2ps [[REG1]], [[REG2:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: vcvttss2si [[REG2]], %rax
		; CHECK-F16C-NEXT: retq
		%a = load half, half* %p, align 2
		%r = fptosi half %a to i64
		ret i64 %r
		}

		define void @test_sitofp_i64(i64 %a, half* %p) #0 {
		; CHECK-LABEL: test_sitofp_i64:

		; CHECK-LIBCALL-NEXT: pushq [[ADDR:%[a-z]+]]
		; CHECK-LIBCALL-NEXT: movq %rsi, [[ADDR]]
		; CHECK-LIBCALL-NEXT: cvtsi2ssq %rdi, %xmm0
		; CHECK-LIBCALL-NEXT: callq __gnu_f2h_ieee
		; CHECK-LIBCALL-NEXT: movw %ax, ([[ADDR]])
		; CHECK_LIBCALL-NEXT: popq [[ADDR]]
		; CHECK_LIBCALL-NEXT: retq

		; CHECK-F16C-NEXT: vcvtsi2ssq %rdi, [[REG0:%[a-z0-9]+]], [[REG0]]
		; CHECK-F16C-NEXT: vcvtps2ph $0, [[REG0]], [[REG0]]
		; CHECK-F16C-NEXT: vmovd [[REG0]], %eax
		; CHECK-F16C-NEXT: movw %ax, (%rsi)
		; CHECK-F16C-NEXT: retq
		%r = sitofp i64 %a to half
		store half %r, half* %p
		ret void
		}

		define i64 @test_fptoui_i64(half* %p) #0 {
		; CHECK-LABEL: test_fptoui_i64:

		; FP_TO_UINT is expanded using FP_TO_SINT
		; CHECK-LIBCALL-NEXT: pushq %rax
		; CHECK-LIBCALL-NEXT: movzwl (%rdi), %edi
		; CHECK-LIBCALL-NEXT: callq __gnu_h2f_ieee
		; CHECK-LIBCALL-NEXT: movss {{.[A-Z_0-9]+}}(%rip), [[REG1:%[a-z0-9]+]]
		; CHECK-LIBCALL-NEXT: movaps %xmm0, [[REG2:%[a-z0-9]+]]
		; CHECK-LIBCALL-NEXT: subss [[REG1]], [[REG2]]
		; CHECK-LIBCALL-NEXT: cvttss2si [[REG2]], [[REG3:%[a-z0-9]+]]
		; CHECK-LIBCALL-NEXT: movabsq $-9223372036854775808, [[REG4:%[a-z0-9]+]]
		; CHECK-LIBCALL-NEXT: xorq [[REG3]], [[REG4]]
		; CHECK-LIBCALL-NEXT: cvttss2si %xmm0, [[REG5:%[a-z0-9]+]]
		; CHECK-LIBCALL-NEXT: ucomiss [[REG1]], %xmm0
		; CHECK-LIBCALL-NEXT: cmovaeq [[REG4]], [[REG5]]
		; CHECK-LIBCALL-NEXT: popq %rdx
		; CHECK-LIBCALL-NEXT: retq

		; CHECK-F16C-NEXT: movswl (%rdi), [[REG0:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: vmovd [[REG0]], [[REG1:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: vcvtph2ps [[REG1]], [[REG2:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: vmovss {{.[A-Z_0-9]+}}(%rip), [[REG3:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: vsubss [[REG3]], [[REG2]], [[REG4:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: vcvttss2si [[REG4]], [[REG5:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: movabsq $-9223372036854775808, [[REG6:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: xorq [[REG5]], [[REG6:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: vcvttss2si [[REG2]], [[REG7:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: vucomiss [[REG3]], [[REG2]]
		; CHECK-F16C-NEXT: cmovaeq [[REG6]], %rax
		; CHECK-F16C-NEXT: retq
		%a = load half, half* %p, align 2
		%r = fptoui half %a to i64
		ret i64 %r
		}

		define void @test_uitofp_i64(i64 %a, half* %p) #0 {
		; CHECK-LABEL: test_uitofp_i64:
		; CHECK-LIBCALL-NEXT: pushq [[ADDR:%[a-z0-9]+]]
		; CHECK-LIBCALL-NEXT: movq %rsi, [[ADDR]]
		; CHECK-NEXT: movl %edi, [[REG0:%[a-z0-9]+]]
		; CHECK-NEXT: andl $1, [[REG0]]
		; CHECK-NEXT: testq %rdi, %rdi
		; CHECK-NEXT: js [[LABEL1:.LBB[0-9_]+]]

		; simple conversion to float if non-negative
		; CHECK-LIBCALL-NEXT: cvtsi2ssq %rdi, [[REG1:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: vcvtsi2ssq %rdi, [[REG1:%[a-z0-9]+]], [[REG1]]
		; CHECK-NEXT: jmp [[LABEL2:.LBB[0-9_]+]]

		; convert using shift+or if negative
		; CHECK-NEXT: [[LABEL1]]:
		; CHECK-NEXT: shrq %rdi
		; CHECK-NEXT: orq %rdi, [[REG2:%[a-z0-9]+]]
		; CHECK-LIBCALL-NEXT: cvtsi2ssq [[REG2]], [[REG3:%[a-z0-9]+]]
		; CHECK-LIBCALL-NEXT: addss [[REG3]], [[REG1]]
		; CHECK-F16C-NEXT: vcvtsi2ssq [[REG2]], [[REG3:%[a-z0-9]+]], [[REG3]]
		; CHECK-F16C-NEXT: vaddss [[REG3]], [[REG3]], [[REG1:[%a-z0-9]+]]

		; convert float to half
		; CHECK-NEXT: [[LABEL2]]:
		; CHECK-LIBCALL-NEXT: callq __gnu_f2h_ieee
		; CHECK-LIBCALL-NEXT: movw %ax, ([[ADDR]])
		; CHECK-LIBCALL-NEXT: popq [[ADDR]]
		; CHECK-F16C-NEXT: vcvtps2ph $0, [[REG1]], [[REG4:%[a-z0-9]+]]
		; CHECK-F16C-NEXT: vmovd [[REG4]], %eax
		; CHECK-F16C-NEXT: movw %ax, (%rsi)
		; CHECK-NEXT: retq

		%r = uitofp i64 %a to half
		store half %r, half* %p
		ret void
		}

		define <4 x float> @test_extend32_vec4(<4 x half>* %p) #0 {
		; CHECK-LABEL: test_extend32_vec4:

		; CHECK-LIBCALL: callq __gnu_h2f_ieee
		; CHECK-LIBCALL: callq __gnu_h2f_ieee
		; CHECK-LIBCALL: callq __gnu_h2f_ieee
		; CHECK-LIBCALL: callq __gnu_h2f_ieee
		; CHECK-F16C: vcvtph2ps
		; CHECK-F16C: vcvtph2ps
		; CHECK-F16C: vcvtph2ps
		; CHECK-F16C: vcvtph2ps
		%a = load <4 x half>, <4 x half>* %p, align 8
		%b = fpext <4 x half> %a to <4 x float>
		ret <4 x float> %b
		}

		define <4 x double> @test_extend64_vec4(<4 x half>* %p) #0 {
		; CHECK-LABEL: test_extend64_vec4

		; CHECK-LIBCALL: callq __gnu_h2f_ieee
		; CHECK-LIBCALL-DAG: callq __gnu_h2f_ieee
		; CHECK-LIBCALL-DAG: callq __gnu_h2f_ieee
		; CHECK-LIBCALL-DAG: callq __gnu_h2f_ieee
		; CHECK-LIBCALL-DAG: cvtss2sd
		; CHECK-LIBCALL-DAG: cvtss2sd
		; CHECK-LIBCALL-DAG: cvtss2sd
		; CHECK-LIBCALL: cvtss2sd
		; CHECK-F16C: vcvtph2ps
		; CHECK-F16C-DAG: vcvtph2ps
		; CHECK-F16C-DAG: vcvtph2ps
		; CHECK-F16C-DAG: vcvtph2ps
		; CHECK-F16C-DAG: vcvtss2sd
		; CHECK-F16C-DAG: vcvtss2sd
		; CHECK-F16C-DAG: vcvtss2sd
		; CHECK-F16C: vcvtss2sd
		%a = load <4 x half>, <4 x half>* %p, align 8
		%b = fpext <4 x half> %a to <4 x double>
		ret <4 x double> %b
		}

		define void @test_trunc32_vec4(<4 x float> %a, <4 x half>* %p) {
		; CHECK-LABEL: test_trunc32_vec4:

		; CHECK-LIBCALL: callq __gnu_f2h_ieee
		; CHECK-LIBCALL: callq __gnu_f2h_ieee
		; CHECK-LIBCALL: callq __gnu_f2h_ieee
		; CHECK-LIBCALL: callq __gnu_f2h_ieee
		; CHECK-F16C: vcvtps2ph
		; CHECK-F16C: vcvtps2ph
		; CHECK-F16C: vcvtps2ph
		; CHECK-F16C: vcvtps2ph
		; CHECK: movw
		; CHECK: movw
		; CHECK: movw
		; CHECK: movw
		%v = fptrunc <4 x float> %a to <4 x half>
		store <4 x half> %v, <4 x half>* %p
		ret void
		}

		define void @test_trunc64_vec4(<4 x double> %a, <4 x half>* %p) {
		; CHECK-LABEL: test_trunc64_vec4:
		; CHECK: callq __truncdfhf2
		; CHECK: callq __truncdfhf2
		; CHECK: callq __truncdfhf2
		; CHECK: callq __truncdfhf2
		; CHECK: movw
		; CHECK: movw
		; CHECK: movw
		; CHECK: movw
		%v = fptrunc <4 x double> %a to <4 x half>
		store <4 x half> %v, <4 x half>* %p
		ret void
		}

		attributes #0 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Updates to X86 backend for f16 promotion
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 25482

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/half.ll

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Updates to X86 backend for f16 promotionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 25482

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/half.ll

[X86] Updates to X86 backend for f16 promotion
ClosedPublic