This is an archive of the discontinued LLVM Phabricator instance.

Differential D4558

NVPTX: support f64 <-> f16 intrinsics
ClosedPublic

Authored by t.p.northover on Jul 17 2014, 5:41 AM.

Download Raw Diff

Details

Reviewers

jholewinski

Summary

Hi,

I'm in the process of reworking how we handle the __fp16 type slightly. I have larger goals, but the most important immediate one is to perform extensions and truncations in one step so that this C code has IEEE-sensible semantics:

void my_round(double in, __fp16 *out) { *out = in; }

Now, I *think* this is fairly academic as far as OpenCL is concerned (you have to use the vload_half/vstore_half functions to access __fp16 at all times), but I'd like to minimise breakage as far as possible anyway.

As part of this I've made the @llvm.convert.from.fp16 and @llvm.convert.to.fp16 intrinsics polymorphic, and would like to add support for f64 variants in as many places as possible.

NVPTX already seemed to have the instructions there, waiting to be used so I added a couple of patterns and a test.

Are you happy for me to commit the change?

Cheers.

Tim.

Diff Detail

Event Timeline

t.p.northover updated this revision to Diff 11573.Jul 17 2014, 5:41 AM

t.p.northover retitled this revision from to NVPTX: support f64 <-> f16 intrinsics.

t.p.northover updated this object.

t.p.northover edited the test plan for this revision. (Show Details)

t.p.northover added a subscriber: Unknown Object (MLST).

Herald added a subscriber: jholewinski. · View Herald TranscriptJul 17 2014, 5:41 AM

t.p.northover updated this object.Jul 17 2014, 5:41 AM

This LGTM. Thanks for implementing this!

This revision is now accepted and ready to land.Jul 17 2014, 6:59 AM

Thanks Justin. I've committed this as r213356.

Revision Contents

Path

Size

lib/

Target/

NVPTX/

NVPTXIntrinsics.td

5 lines

test/

CodeGen/

NVPTX/

fp16.ll

45 lines

Diff 11573

lib/Target/NVPTX/NVPTXIntrinsics.td

Context not available.
	def : Pat<(i16 (fp_to_f16 Float32Regs:$a)),	def : Pat<(i16 (fp_to_f16 Float32Regs:$a)),
	(CVT_f16_f32 Float32Regs:$a, CvtRN)>;	(CVT_f16_f32 Float32Regs:$a, CvtRN)>;

		def : Pat<(f64 (f16_to_fp Int16Regs:$a)),
		(CVT_f64_f16 Int16Regs:$a, CvtNONE)>;
		def : Pat<(i16 (fp_to_f16 Float64Regs:$a)),
		(CVT_f16_f64 Float64Regs:$a, CvtRN)>;

	//	//
	// Bitcast	// Bitcast
	//	//
Context not available.

test/CodeGen/NVPTX/fp16.ll

This file was added.

				; RUN: llc -march=nvptx -verify-machineinstrs < %s \| FileCheck %s

				declare float @llvm.convert.from.fp16.f32(i16) nounwind readnone
				declare double @llvm.convert.from.fp16.f64(i16) nounwind readnone
				declare i16 @llvm.convert.to.fp16.f32(float) nounwind readnone
				declare i16 @llvm.convert.to.fp16.f64(double) nounwind readnone

				; CHECK-LABEL: @test_convert_fp16_to_fp32
				; CHECK: cvt.f32.f16
				define void @test_convert_fp16_to_fp32(float addrspace(1)* noalias %out, i16 addrspace(1)* noalias %in) nounwind {
				%val = load i16 addrspace(1)* %in, align 2
				%cvt = call float @llvm.convert.from.fp16.f32(i16 %val) nounwind readnone
				store float %cvt, float addrspace(1)* %out, align 4
				ret void
				}


				; CHECK-LABEL: @test_convert_fp16_to_fp64
				; CHECK: cvt.f64.f16
				define void @test_convert_fp16_to_fp64(double addrspace(1)* noalias %out, i16 addrspace(1)* noalias %in) nounwind {
				%val = load i16 addrspace(1)* %in, align 2
				%cvt = call double @llvm.convert.from.fp16.f64(i16 %val) nounwind readnone
				store double %cvt, double addrspace(1)* %out, align 4
				ret void
				}


				; CHECK-LABEL: @test_convert_fp32_to_fp16
				; CHECK: cvt.rn.f16.f32
				define void @test_convert_fp32_to_fp16(i16 addrspace(1)* noalias %out, float addrspace(1)* noalias %in) nounwind {
				%val = load float addrspace(1)* %in, align 2
				%cvt = call i16 @llvm.convert.to.fp16.f32(float %val) nounwind readnone
				store i16 %cvt, i16 addrspace(1)* %out, align 4
				ret void
				}


				; CHECK-LABEL: @test_convert_fp64_to_fp16
				; CHECK: cvt.rn.f16.f64
				define void @test_convert_fp64_to_fp16(i16 addrspace(1)* noalias %out, double addrspace(1)* noalias %in) nounwind {
				%val = load double addrspace(1)* %in, align 2
				%cvt = call i16 @llvm.convert.to.fp16.f64(double %val) nounwind readnone
				store i16 %cvt, i16 addrspace(1)* %out, align 4
				ret void
				}