Page MenuHomePhabricator

[AArch64] Implement FP16FML intrinsics
ClosedPublic

Authored by bryanpkc on Oct 23 2018, 9:41 PM.

Details

Summary

Generate the FP16FML intrinsics into arm_neon.h (AArch64 only for now).
Add two new type modifiers to NeonEmitter to handle the new prototypes.
Define __ARM_FEATURE_FP16FML when +fp16fml is enabled and guard the
intrinsics with the macro in arm_neon.h.

Based on a patch by Gao Yiling.

Diff Detail

Repository
rL LLVM

Event Timeline

bryanpkc created this revision.Oct 23 2018, 9:41 PM
t.p.northover accepted this revision.Oct 24 2018, 10:52 AM

I think this is reasonable.

This revision is now accepted and ready to land.Oct 24 2018, 10:52 AM

I think this is reasonable.

Thanks Tim. Could you also review D53632, which is the LLVM part of this implementation?

This revision was automatically updated to reflect the committed changes.
ab added a subscriber: ab.Feb 14 2019, 4:30 PM
ab added inline comments.
cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
12

Hey folks, I'm curious: where does the "_u32" suffix come from? Should it be _f16?

Also, are there any new ACLE/intrinsic list documents? As far as I can tell there hasn't been any release since IHI0073B/IHI0053D.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2019, 4:30 PM
Herald added a subscriber: jdoerfert. · View Herald Transcript
SjoerdMeijer added inline comments.Feb 15 2019, 2:32 AM
cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
12

Also, are there any new ACLE/intrinsic list documents? As far as I can tell there hasn't been any release since IHI0073B/IHI0053D.

I've checked, and an updated ACLE that includes these FP16FML intrinsics is coming soon.

where does the "_u32" suffix come from? Should it be _f16?

Good question. It could probably be _f32 or _f16, but _u32 doesn't seem to make much sense. Looks like the spec says _u32, and that's also what GCC has implemented. I think we want to update the spec and fix the name before the updated spec is available. Will chase this, and let you know once I know more.

SjoerdMeijer added inline comments.Feb 15 2019, 6:52 AM
cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
12

An update on this: we should change this to _f32 (because the first suffixes were refering to the ouput type). The ACLE will be updated accordingly, and also GCC will change its current implementation (from _u32 to _f32). Many thanks for raising this issue.
Is there a volunteer to prepare a patch? Or do you have one already? :-) I could look at it, but that will be towards the end of next week.

ab added inline comments.Feb 15 2019, 2:09 PM
cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
12

I've checked, and an updated ACLE that includes these FP16FML intrinsics is coming soon.

Great, thanks!

An update on this: we should change this to _f32 (because the first suffixes were refering to the ouput type).

Hmm, I was thinking _f16 based on the vmlal intrinsics: they seem to be named after the multiplication type rather than that of the accumulator/output.

Either way seems fine to me though, I'll defer to you folks.

The ACLE will be updated accordingly, and also GCC will change its current implementation (from _u32 to _f32). Many thanks for raising this issue.

Is there a volunteer to prepare a patch? Or do you have one already? :-) I could look at it, but that will be towards the end of next week.

Sure: D58306 (with _f16 though, let me know what you think of vmlal)

Thanks for checking!

FYI: a new ACLE version has been published, please find it here: https://developer.arm.com/architectures/system-architectures/software-standards/acle

The "Neon Intrinsics" section contains these new intrinsics.