This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][X86] Fix handling of __fp16 vectors
ClosedPublic

Authored by ahatanak on Nov 15 2017, 4:46 PM.

Details

Summary

IRGen for __fp16 vectors on X86 is currently completely broken. For example when the following code is compiled:

half4 hv0, hv1, hv2; // these are vectors of __fp16.

void foo221() {
  hv0 = hv1 + hv2;
}

clang generates the following IR, in which two i16 values are added:

@hv1 = common global <4 x i16> zeroinitializer, align 8
@hv2 = common global <4 x i16> zeroinitializer, align 8
@hv0 = common global <4 x i16> zeroinitializer, align 8

define void @foo221() {
entry:
  %0 = load <4 x i16>, <4 x i16>* @hv1, align 8
  %1 = load <4 x i16>, <4 x i16>* @hv2, align 8
  %add = add <4 x i16> %0, %1
  store <4 x i16> %add, <4 x i16>* @hv0, align 8
  ret void
}

To fix IRGen for fp16 vectors, this patch uses the code committed in r314056, which modified clang to promote and truncate fp16 vectors to and from float vectors in the AST. Also, as the first step toward doing away with the fp16 conversion intrinsics such as @llvm.convert.to.fp16 (see http://lists.llvm.org/pipermail/llvm-dev/2014-July/074689.html), I made changes to IRGen for fp16 scalars so that fpext/fptrunc instructions are emitted instead of the fp16 conversion intrinsics IRGen currently emits. This fixes another IRGen bug where a short value is assigned to an fp16 variable without any integer-to-floating-point conversion, as shown in the following example:

C code

__fp16 a;
short b;

void foo1() {
  a = b;
}

generated IR

@b = common global i16 0, align 2
@a = common global i16 0, align 2

define void @foo1() #0 {
entry:
  %0 = load i16, i16* @b, align 2
  store i16 %0, i16* @a, align 2
  ret void
}

I haven't spent too much time inspecting the code the X86 backend emits, but the code I've seen so far seems at least functionally correct (although it doesn't look very efficient since the backend scalarizes __fp16 vectors).

Diff Detail

Event Timeline

ahatanak created this revision.Nov 15 2017, 4:46 PM
bruno added inline comments.Nov 27 2017, 11:53 AM
lib/CodeGen/CGExprScalar.cpp
954

This (and in the other places in the patch) means that regardless of HalfArgsAndReturns state we want to generate an intrinsic call if useFP16ConversionIntrinsics() is true, is that always the intended behavior?

ahatanak added inline comments.Nov 27 2017, 12:08 PM
lib/CodeGen/CGExprScalar.cpp
954

Yes, that is the intended behavior.

HalfArgsAndReturns is used here to determine whether intrinsic calls should be emitted, but it seems to me that it should only be used to indicate whether returning or passing half types is allowed. Currently ARM and ARM64 are the only targets that are allowed to return or pass half types, but I think it's possible to allow other targets to do so too if that's desirable.

Any other comments from anyone?

bruno accepted this revision.Dec 4 2017, 11:29 AM

LGTM with the minor comment below.

include/clang/Basic/TargetInfo.h
563

Add a "FIXME"

This revision is now accepted and ready to land.Dec 4 2017, 11:29 AM
This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.