This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Support for v4f16 and v8f16 vectors
ClosedPublic

Authored by SjoerdMeijer on Mar 15 2018, 1:23 PM.

Details

Summary

This is the groundwork for the Armv8.2-A FP16
vector intrinsics, which uses v4f16 and v8f16 vector operands
and return values. All the moving parts are tested with two
intrinsics, a 1-operand v8f16 and a 2-operand v4f16 intrinsic. In a
follow-up patch the rest of the intrinsics and tests will be added.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Mar 15 2018, 1:23 PM

It looks like there's nothing testing the bitconvert patterns here.

Also, as a general question, didn't the Clang changes go in (and get reverted) before this? That seems pretty dodgy to me; I'm not entirely happy about having user-visible but broken intrinsics in arm_acle.h.

lib/Target/ARM/ARMInstrNEON.td
7020

I know big-endian is a pain in the neck that hardly anyone actually uses, but we probably shouldn't just let it bit-rot

Hi Tim,
Thanks for your comments. I will try to answer your general question first (if I understand that correctly).
There should be no dodgy business going on here. I am trying to avoid exactly that, and I've reverted the
Clang and user visible part when that showed problems in testing. Please note that this reverted the
A32 intrinsics, the A64 are still in and should be okay. The FP16 A32 intrinsics were behaving as expected,
but there was some interaction with existing intrinsics and __fp16. The reason for that is that the author
of the intrinsics patches changed passing vectors of i16 types, to vector of f16s.
With the user-visible part reverted, I tried fixing the backend first with this patch. But I've just noticed that
this patch doesn't solve that case, which is obviously what I also tried to achieve here. There are some issues now
with legalising f16 vectors (when fullfp16 is not enabled). I am now first going to rethink why we want to pass
f16 vectors instead of sticking to i16s..
Cheers.

Ah good, thanks for the explanation. Sounds like you were already doing exactly what I was thinking of and I started paying attention half-way through.

I am now first going to rethink why we want to pass f16 vectors instead of sticking to i16.

I think either could work. Longer term we probably want <N x half> for tidiness, but It's probably not essential to begin with.

I have uploaded (companion) Clang patch: https://reviews.llvm.org/D44561
this passes the Half Type when it is appropriate to do so, and i16 otherwise.
Thus, the normal neon intrinsics work as before and as expected, and with this
LLVM patch for f16 vectors, the FP16 vector intrinsics are fine too.

About the bitconverts, they were actually necessary to get the code generation
working for the 2 intrinsics in the regression tests. They run in hard and
softfp mode, and these bitconverts patterns were necessary for the softfp case,
but I will double check to be sure.

If we are happy with this, then I will first add (or actually reenable) the AArch32
FP16 vector intrinsics to clang again, and then complete the rest of the LLVM
codegen tests.

Yep, I thought the bitconverts were a good idea, but should also have the variants that trigger for IsBE.

Yep, I thought the bitconverts were a good idea, but should also have the variants that trigger for IsBE.

Ah yes, sorry, I misunderstood and missed that. Will fix that.

Added big-endian bitconvert patterns and test.

t.p.northover accepted this revision.Mar 19 2018, 4:58 AM

Thanks. This looks fine now too.

This revision is now accepted and ready to land.Mar 19 2018, 4:58 AM
This revision was automatically updated to reflect the committed changes.