This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add patterns for FMADD, FMSUB
ClosedPublic

Authored by overmighty on Aug 15 2023, 11:17 AM.

Details

Summary

FMADD, FMSUB instructions perform better or the same compared to indexed
FMLA, FMLS.

For example, the Arm Cortex-A55 Software Optimization Guide lists "FP
multiply accumulate" FMADD, FMSUB instructions with a throughput of 2
IPC, whereas it lists "ASIMD FP multiply accumulate, by element" FMLA,
FMLS with a throughput of 1 IPC.

The Arm Cortex-A77 Software Optimization Guide, however, does not
separately list "by element" variants of the "ASIMD FP multiply
accumulate" instructions, which are listed with the same throughput of 2
IPC as "FP multiply accumulate" instructions.

Diff Detail

Event Timeline

overmighty created this revision.Aug 15 2023, 11:17 AM
overmighty requested review of this revision.Aug 15 2023, 11:17 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 15 2023, 11:17 AM

It looks like there are patterns for A[0]*B+C and B*A[0]*C to handle commutivity. Are there tests for both forms too?

samtebbs accepted this revision.Aug 17 2023, 2:09 AM

Thanks, this looks good to me with any extra tests required by David's comment.

This revision is now accepted and ready to land.Aug 17 2023, 2:09 AM

Add tests for A[0]*B+C patterns.

If this still looks good to you, please commit it as "OverMighty <its.overmighty@gmail.com>". Thank you.

The crashes in complex-deinterleaving-f16-mul.ll and fp16_intrinsic_lane.ll seem to be caused by D156831.

dzhidzhoev added inline comments.
llvm/lib/Target/AArch64/AArch64InstrFormats.td
5413–5425

The result type of (EXTRACT_SUBREG ...) here seems to be deduced to i16, which causes an assertion after applying this rule.
Explicit specification of the result type fixes crashes of tests:

FPR16:$Rn, (f16 (EXTRACT_SUBREG V128:$Rm, hsub)), FPR16:$Ra)>;
(f16 (EXTRACT_SUBREG V128:$Rn, hsub)), FPR16:$Rm, FPR16:$Ra)>
dzhidzhoev added inline comments.Aug 18 2023, 3:26 PM
llvm/lib/Target/AArch64/AArch64InstrFormats.td
5425

BTW, these lines add some patterns for fnmadd. Could you add some tests for them? Or are they already covered by existing ones?

dzhidzhoev added inline comments.Aug 18 2023, 3:48 PM
llvm/lib/Target/AArch64/AArch64InstrFormats.td
5450

Is it possible to use extractelt here? Since vector_extract is marked as deprecated in TargetSelectionDAG.td

overmighty added a reviewer: dzhidzhoev.
  • Rebase new upstream commits.
  • Replace usage of deprecated vector_extract with extractelt in new patterns.
  • Add f16 operators before EXTRACT_SUBREGs in new *Hrrr instruction pattern results to fix crashing tests.
overmighty added inline comments.Aug 19 2023, 8:22 AM
llvm/lib/Target/AArch64/AArch64InstrFormats.td
5425

The patterns for FNMADD and FNMSUB are useless as there are no Neon vector equivalents of these instructions. Should I add a template argument to ThreeOperandFPData to prevent the patterns from being generated for them?

I previously had patterns in AArch64InstrInfo.td for FMADD and FMSUB only and it was suggested to me to move them to ThreeOperandFPData: https://reviews.llvm.org/D153207?id=532433#inline-1481961. I was asked to split that part of my previous patch into a second (this) one.

5450

I saw the comment marking it as deprecated but I also saw new commits using vector_extract. I should have asked for clarification earlier. Is vector_extract truly deprecated? Should I change patterns I added in previous patches to use extractelt too?

dzhidzhoev accepted this revision.Aug 23 2023, 9:11 AM
dzhidzhoev added a subscriber: arsenm.
dzhidzhoev added inline comments.
llvm/lib/Target/AArch64/AArch64InstrFormats.td
5425

The patterns for FNMADD and FNMSUB are useless as there are no Neon vector equivalents of these instructions. Should I add a template argument to ThreeOperandFPData to prevent the patterns from being generated for them?

Thank you for the explanation! There's no need for it since I can barely find a way it can interfere with other patterns.

5450

Thank you! Now I've noticed that the comment about the deprecation was made in 2015, and vector_extract is still in use. @arsenm, could you suggest whether it is relevant?

Ping.

If you would like the current patch with extractelt to be committed, please commit it as "OverMighty <its.overmighty@gmail.com>". Thank you.

This revision was landed with ongoing or failed builds.Aug 30 2023, 3:39 AM
This revision was automatically updated to reflect the committed changes.