This is an archive of the discontinued LLVM Phabricator instance.

[X86][FastISel] Teach how to select float-half conversion intrinsics.
ClosedPublic

Authored by andreadb on Feb 16 2015, 8:46 AM.

Download Raw Diff

Details

Reviewers

qcolombet
mkuper
ributzka

Commits

rG7035178aebc9: [X86][FastIsel] Teach how to select float-half conversion intrinsics.
rL230043: [X86][FastIsel] Teach how to select float-half conversion intrinsics.

Summary

This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and intrinsic 'convert_to_fp16'.

If the target has F16C (and no -soft-float), we can select instruction VCVTPS2PHrr for a float-to-half conversion, and VCVTPH2PSrr for a half-to-float conversion.

Added test fast-isel-float-half-convertion.ll to check that fast-isel doesn't fail to select float-half conversions if the target has F16C.

Please let me know if ok to submit.

Thanks,
Andrea

Diff Detail

Repository: rL LLVM

Event Timeline

andreadb updated this revision to Diff 20036.Feb 16 2015, 8:46 AM

andreadb retitled this revision from to [X86][FastISel] Teach how to select float-half conversion intrinsics..

andreadb updated this object.

andreadb edited the test plan for this revision. (Show Details)

andreadb added reviewers: mkuper, ributzka, qcolombet.

andreadb added a subscriber: Unknown Object (MLST).

ab added a subscriber: ab.Feb 16 2015, 11:44 AM

qcolombet added inline comments.Feb 17 2015, 11:03 AM

lib/Target/X86/X86FastISel.cpp
2149 ↗	(On Diff #20036)	Shouldn’t we have some checks that the type is not double for any cases?
2162 ↗	(On Diff #20036)	I think it would be cleaner to generate: res = implicit_def res2 = insert_subreg res, inputreg, 0 A copy with mismatching size sounds wrong to me.
2182 ↗	(On Diff #20036)	EXTRACT_SUBREG here I believe.
test/CodeGen/X86/fast-isel-float-half-convertion.ll
7 ↗	(On Diff #20036)	Could you add tests with doubles? I may be wrong but I thought the intrinsic allows any floating type.

Hi Quentin,

lib/Target/X86/X86FastISel.cpp
2149 ↗	(On Diff #20036)	Right, I should check that neither the operand nor the return type is double. I didn't take into account the fact that the intrinsic allows any floating point type.
2162 ↗	(On Diff #20036)	Ok, I will change it.
2182 ↗	(On Diff #20036)	I will fix it.
test/CodeGen/X86/fast-isel-float-half-convertion.ll
7 ↗	(On Diff #20036)	Right, the intrinsic allows any floating point type. What if I add those tests into a separate test file maybe an XFAIL test)? My concern is that if I add extra tests for doubles in this same file, then the test will start failing because of flag -fast-isel-abort. What do you think?

qcolombet added inline comments.Feb 17 2015, 1:02 PM

test/CodeGen/X86/fast-isel-float-half-convertion.ll
7 ↗	(On Diff #20036)	Good point. Sounds good to me.

Hi Quentin,

Here is a new version of the patch. which hopefully addresses all your comments.

This patch checks that the operand type of intrinsic 'convert_to_fp16' is 'float', and that the return type of intrinsic 'convert_from_fp16' is 'float'. Those checks are required because both intrinsics may accept 'any' floating point type (even 'double' and 'long double').

As you suggested, I added another test (named 'fast-isel-float-double-convertion.ll') to check that fast-isel doesn't accidentally select a wrong instruction for double-to-half conversions. This new test is currently marked XFAIL since fast-isel only knows how to select float-to-half and half-to-float conversions.

In the previous patch you suggested to use an INSERT_SUBREG to perform an element insertion into a vector.
However, INSERT_SUBREG requires a valid sub-register index operand to identify which sub-register we want to address. Unfortunately, register class VR128 doesn't allow to use any sub-register index; therefore we cannot use insert_subreg to address the lower 32-bits of a VR128 register.

Instead, I implemented the element insertion (from GR32 to VR128) using tablegen'd function 'fastEmit_r' to emit the equivalent of a SCALAR_TO_VECTOR.
Conversions from FR32-to-VR128 are implicitly handled by method 'constrainOperandRegClass' (used by all the 'fastEmitInst_*' methods in FastISel).

We cannot use an 'extract_subreg' to extract a FR32 from VR128 for the same reason why we cannot use 'insert_subreg' on to promote an FR32 to VR128 (i.e. there is no sub_reg index that we can use). I found out that it is perfectly ok to 'copy' from register class VR128 to class FR32; the two classes are basically identical except for the accepted value types. This is also what ISel normally does when promoting FR32 to VR128 (and from VR128 to FR32). See for example the tablegen patterns in X86InstrSSE.td.

For example:

(f32 (vector_extract (v4f32 VR128:$src), (iPTR 0))) -> 
    (COPY_TO_REGCLASS (v4f32 VR128:$src), FR32)

(v4f32 (scalar_to_vector FR32:%src)) ->
    (COPY_TO_REGCLASS FR32:$src, VR128)

Please let me know if ok to submit.

Thanks,
Andrea

Hi Andrea,

LGTM.

Thanks for checking.

Quentin

This revision is now accepted and ready to land.Feb 20 2015, 10:39 AM

Closed by commit rL230043: [X86][FastIsel] Teach how to select float-half conversion intrinsics. (authored by adibiagio). · Explain WhyFeb 20 2015, 11:39 AM

This revision was automatically updated to reflect the committed changes.

Thanks Quentin!
Committed revision 230043.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86FastISel.cpp

62 lines

test/

CodeGen/

X86/

fast-isel-double-half-convertion.ll

23 lines

fast-isel-float-half-convertion.ll

28 lines

Diff 20423

llvm/trunk/lib/Target/X86/X86FastISel.cpp

Show First 20 Lines • Show All 2,176 Lines • ▼ Show 20 Lines	bool X86FastISel::TryEmitSmallMemcpy(X86AddressMode DestAM,

return true;		return true;
}		}

bool X86FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {		bool X86FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
// FIXME: Handle more intrinsics.		// FIXME: Handle more intrinsics.
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default: return false;		default: return false;
		case Intrinsic::convert_from_fp16:
		case Intrinsic::convert_to_fp16: {
		if (TM.Options.UseSoftFloat \|\| !Subtarget->hasF16C())
		return false;

		const Value *Op = II->getArgOperand(0);
		unsigned InputReg = getRegForValue(Op);
		if (InputReg == 0)
		return false;

		// F16C only allows converting from float to half and from half to float.
		bool IsFloatToHalf = II->getIntrinsicID() == Intrinsic::convert_to_fp16;
		if (IsFloatToHalf) {
		if (!Op->getType()->isFloatTy())
		return false;
		} else {
		if (!II->getType()->isFloatTy())
		return false;
		}

		unsigned ResultReg = 0;
		const TargetRegisterClass *RC = TLI.getRegClassFor(MVT::v8i16);
		if (IsFloatToHalf) {
		// 'InputReg' is implicitly promoted from register class FR32 to
		// register class VR128 by method 'constrainOperandRegClass' which is
		// directly called by 'fastEmitInst_ri'.
		// Instruction VCVTPS2PHrr takes an extra immediate operand which is
		// used to provide rounding control.
		InputReg = fastEmitInst_ri(X86::VCVTPS2PHrr, RC, InputReg, false, 0);

		// Move the lower 32-bits of ResultReg to another register of class GR32.
		ResultReg = createResultReg(&X86::GR32RegClass);
		BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
		TII.get(X86::VMOVPDI2DIrr), ResultReg)
		.addReg(InputReg, RegState::Kill);

		// The result value is in the lower 16-bits of ResultReg.
		unsigned RegIdx = X86::sub_16bit;
		ResultReg = fastEmitInst_extractsubreg(MVT::i16, ResultReg, true, RegIdx);
		} else {
		assert(Op->getType()->isIntegerTy(16) && "Expected a 16-bit integer!");
		// Explicitly sign-extend the input to 32-bit.
		InputReg = fastEmit_r(MVT::i16, MVT::i32, ISD::SIGN_EXTEND, InputReg,
		/Kill=/false);

		// The following SCALAR_TO_VECTOR will be expanded into a VMOVDI2PDIrr.
		InputReg = fastEmit_r(MVT::i32, MVT::v4i32, ISD::SCALAR_TO_VECTOR,
		InputReg, /Kill=/true);

		InputReg = fastEmitInst_r(X86::VCVTPH2PSrr, RC, InputReg, /Kill=/true);

		// The result value is in the lower 32-bits of ResultReg.
		// Emit an explicit copy from register class VR128 to register class FR32.
		ResultReg = createResultReg(&X86::FR32RegClass);
		BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
		TII.get(TargetOpcode::COPY), ResultReg)
		.addReg(InputReg, RegState::Kill);
		}

		updateValueMap(II, ResultReg);
		return true;
		}
case Intrinsic::frameaddress: {		case Intrinsic::frameaddress: {
MachineFunction *MF = FuncInfo.MF;		MachineFunction *MF = FuncInfo.MF;
if (MF->getTarget().getMCAsmInfo()->usesWindowsCFI())		if (MF->getTarget().getMCAsmInfo()->usesWindowsCFI())
return false;		return false;

Type *RetTy = II->getCalledFunction()->getReturnType();		Type *RetTy = II->getCalledFunction()->getReturnType();

MVT VT;		MVT VT;
▲ Show 20 Lines • Show All 1,225 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/fast-isel-double-half-convertion.ll

				; RUN: llc -fast-isel -fast-isel-abort -mtriple=x86_64-unknown-unknown -mattr=+f16c < %s

				; XFAIL: *

				; In the future, we might want to teach fast-isel how to expand a double-to-half
				; conversion into a double-to-float conversion immediately followed by a
				; float-to-half conversion. For now, fast-isel is expected to fail.

				define double @test_fp16_to_fp64(i32 %a) {
				entry:
				%0 = trunc i32 %a to i16
				%1 = call double @llvm.convert.from.fp16.f64(i16 %0)
				ret float %0
				}

				define i16 @test_fp64_to_fp16(double %a) {
				entry:
				%0 = call i16 @llvm.convert.to.fp16.f64(double %a)
				ret i16 %0
				}

				declare i16 @llvm.convert.to.fp16.f64(double)
				declare double @llvm.convert.from.fp16.f64(i16)

llvm/trunk/test/CodeGen/X86/fast-isel-float-half-convertion.ll

				; RUN: llc -fast-isel -fast-isel-abort -asm-verbose=false -mtriple=x86_64-unknown-unknown -mattr=+f16c < %s \| FileCheck %s

				; Verify that fast-isel correctly expands float-half conversions.

				define i16 @test_fp32_to_fp16(float %a) {
				; CHECK-LABEL: test_fp32_to_fp16:
				; CHECK: vcvtps2ph $0, %xmm0, %xmm0
				; CHECK-NEXT: vmovd %xmm0, %eax
				; CHECK-NEXT: retq
				entry:
				%0 = call i16 @llvm.convert.to.fp16.f32(float %a)
				ret i16 %0
				}

				define float @test_fp16_to_fp32(i32 %a) {
				; CHECK-LABEL: test_fp16_to_fp32:
				; CHECK: movswl %di, %eax
				; CHECK-NEXT: vmovd %eax, %xmm0
				; CHECK-NEXT: vcvtph2ps %xmm0, %xmm0
				; CHECK-NEXT: retq
				entry:
				%0 = trunc i32 %a to i16
				%1 = call float @llvm.convert.from.fp16.f32(i16 %0)
				ret float %1
				}

				declare i16 @llvm.convert.to.fp16.f32(float)
				declare float @llvm.convert.from.fp16.f32(i16)