Page MenuHomePhabricator

Support __fp16 vectors
ClosedPublic

Authored by ahatanak on Apr 25 2017, 10:51 PM.

Details

Summary

Currently, clang miscompiles operations on __fp16 vectors.

For example, when the following code is compiled:

typedef __fp16 half4 __attribute__ ((vector_size (8)));
half4 hv0, hv1, hv2;

void test() {
  hv0 = hv1 + hv2;
}

clang generates the following IR on ARM64:

%1 = load <4 x half>, <4 x half>* @hv1, align 8
%2 = load <4 x half>, <4 x half>* @hv2, align 8
%3 = fadd <4 x half> %1, %2
store <4 x half> %3, <4 x half>* @hv0, align 8

This isn't correct since fp16 values in C or C++ expressions have to be promoted to float if fp16 is not a natively supported type (see gcc's documentation).

https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html

The IR is incorrect on X86 too. The addition is done on <4xi16>vectors:

%1 = load <4 x i16>, <4 x i16>* @hv1, align 8
%2 = load <4 x i16>, <4 x i16>* @hv2, align 8
%3 = add <4 x i16> %2, %1
store <4 x i16> %3, <4 x i16>* @hv0, align 8

This patch makes the changes needed in Sema and IRGen to generate the correct IR on targets that set HalfArgsAndReturns to true but don't support fp16 natively (ARM and ARM64). It inserts implicit casts to promote fp16 vector operands to float vectors and truncate the result back to a __fp16 vector.

I plan to fix X86 and other targets that don't set HalfArgsAndReturns to true in another patch.

Diff Detail

Repository
rL LLVM

Event Timeline

ahatanak created this revision.Apr 25 2017, 10:51 PM
bruno edited edge metadata.Jun 1 2017, 5:46 PM

Hi Akira,

This is nice, thanks for doing it!

include/clang/Sema/Sema.h
9270 ↗(On Diff #100354)

Can you change CompAssign to IsCompAssign so that we match the pattern used in other methods?

9279 ↗(On Diff #100354)

Same here.

lib/CodeGen/CGExprScalar.cpp
997 ↗(On Diff #100354)

Please also add a comment explaining what's going on here, like we see for other snippets of logic above.

It also sounds like this is more generic than it should (which can have unexpected side effects due to the lack of testcases covering vector with other element sizes). I suggest you either (a) add testcases for other sizes or (b) make the condition more restrictive to be sure you're only changing the logic you intend to (i.e., half and i16).

After these changes, if it makes sense, can you refactor the logic under this condition into its own function? Seems like this function is too big already.

lib/Sema/SemaExpr.cpp
11433 ↗(On Diff #100354)

Assuming we're able to handle other vector types here, is it in ConvertHalfVec really necessary? It seems odd to me that we need to special case it in every case below.

test/Sema/fp16vec-sema.c
47 ↗(On Diff #100354)

Can you add a FIXME here?

ahatanak updated this revision to Diff 102642.Jun 15 2017, 1:30 AM
ahatanak marked 3 inline comments as done.

Address review comments.

ahatanak added inline comments.Jun 15 2017, 1:30 AM
lib/CodeGen/CGExprScalar.cpp
997 ↗(On Diff #100354)

I'm surprised that we don't have many code-gen tests for vectors, but I can add more test cases that are the same as the tests in fp16vec-ops.c except that the element types are different.

Just to be clear, the only change I'm making here is to handle cases in which the source and destination have different sizes. That shouldn't happen when for example an i8 vector is converted to an i32 because if it did happen, that would have previously caused an assertion when calling CreateBitCast (bitcast requires the sizes of the destination and source be the same).

Also, I removed the lambda and instead used Type::getPrimitiveSizeInBit to compute the vector size.

If this looks OK, I'll try refactoring this function.

lib/Sema/SemaExpr.cpp
11433 ↗(On Diff #100354)

ConvertHalfVec is needed to distinguish operations that involve vectors of half from those that don't. If the operands are vectors of half, convertHalfVecBinOp is called to promote the operands and truncate the result.

bruno added inline comments.Sep 19 2017, 4:02 PM
lib/CodeGen/CGExprScalar.cpp
1042 ↗(On Diff #115304)

What happens if the SrcElementTy is float and DstElementTy isn't? Seems like it will hit the assertion below.

lib/Sema/SemaExpr.cpp
7511 ↗(On Diff #115304)

Since it seems that you're always doing the same conversion (the only variable input here is the number of elements), can you update the comment to mention the exact conversion?

8072 ↗(On Diff #115304)

What about Convert vector E to a vector with the same number of elements but different element type?

11316 ↗(On Diff #115304)

which are of a half vector type -> should be there an assertion below to make sure?

11329 ↗(On Diff #115304)

Can you add a comment explaining that a) this conversion from float -> int and b) it's needed in case the original binop is a comparison?

11537 ↗(On Diff #115304)

What about of half vector and truncating the result to of half vector to float and truncating the result back to half

11568 ↗(On Diff #115304)

Please add a comment here explaining that this path only happens when it's a compound assignment.

11978 ↗(On Diff #115304)

This same logic is used elsewhere in the patch, perhaps factor it out into a static function?

This revision was automatically updated to reflect the committed changes.
ahatanak reopened this revision.Sep 19 2017, 11:39 PM

Reopening this review. I accidentally closed this review when I committed r313717 (the patch that adds support for 'noescape').

ahatanak updated this revision to Diff 115966.Sep 19 2017, 11:43 PM

Upload the rebased patch again.

ahatanak updated this revision to Diff 116275.Sep 21 2017, 2:57 PM
ahatanak marked 8 inline comments as done.

Address review comments.

lib/CodeGen/CGExprScalar.cpp
1042 ↗(On Diff #115304)

That's not supposed to happen since there are only three cases we have to consider here:

  1. truncation from an int vector to a short vector
  2. promotion from a half vector to a float vector
  3. truncation from a float vector to a half vector

I've cleaned up the assertions and moved them up to make the code clearer.

lib/Sema/SemaExpr.cpp
7511 ↗(On Diff #115304)

It turns out this code isn't needed at all. I accidentally left the code in that I had in my local branch. The conversions between half vectors and float vectors take place after the operands are checked, so both the LHS and the RHS are still half vectors at this point.

Sorry for causing confusion.

11537 ↗(On Diff #115304)

I also changed the comment in CreateBuiltinUnaryOp.

11978 ↗(On Diff #115304)

I factored it out to function 'needsConversionOfHalfVec'.

bruno accepted this revision.Sep 22 2017, 11:03 AM

Thanks Akira.

This revision is now accepted and ready to land.Sep 22 2017, 11:03 AM
This revision was automatically updated to reflect the committed changes.