Page MenuHomePhabricator

Enable fma formation for fp16 on x86 and aarch64
Needs ReviewPublic

Authored by scanon on Fri, Jan 11, 5:48 AM.

Details

Summary

The value of isFMAFasterThanFMulAndFAdd for fp16 should match fp32 on these targets when we legalize by extending to fp32 operations, and (on aarch64) when fp16 arithmetic is supported directly the value should just be true.

Diff Detail

Event Timeline

scanon created this revision.Fri, Jan 11, 5:48 AM
fhahn added a subscriber: fhahn.

The AArch64 side LGTM. Also adding Oliver and Sjoerd as they worked on FP16 side as well, in case they have any thoughts.

The only public documentation for Arm v8.2+ CPUs I could find indicates FADD,FMUL and FMADD have the same latencies and throughputs on Cortex-A75: https://static.docs.arm.com/101398/0200/arm_cortex_a75_software_optimization_guide_v2.pdf

For the X86, I think a test would be great.

The AArch64 side LGTM. Also adding Oliver and Sjoerd as they worked on FP16 side as well, in case they have any thoughts.

I did a same/similar exercise not so long ago for AArch32, M-cores (see https://reviews.llvm.org/D53314). I haven't looked into this for AArch64, but anyway, what I wanted to say is that checking the software optimisation guides is the thing to do: the A55 is also a v8.2 core with FP16. It has a public guide here: http://infocenter.arm.com/help/topic/com.arm.doc.epm128372/arm_cortex_a55_software_optimization_guide_v2.pdf