Page MenuHomePhabricator

[Clang][AArch64] Add SME outer product intrinsics
Needs ReviewPublic

Authored by sagarkulkarni19 on Sep 26 2022, 2:57 PM.

Details

Summary

This patch adds support for the following SME ACLE intrinsics:

  • svmopa_za32[_bf16] // Also for f16, u8, s8, f32
  • svmopa_za64[_u16] // Also for s16, f64
  • svmops_za32[_bf16] // Also for f16, u8, s8, f32
  • svmops_za64[_u16] // Also for s16, f64
  • svsumopa_za32[_s8]
  • svsumopa_za64[_s16]
  • svusmopa_za32[_u8]
  • svusmopa_za64[_u16]
  • svsumops_za32[_s8]
  • svsumops_za64[_s16]
  • svusmops_za32[_u8]
  • svusmops_za64[_u16]

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2022, 2:57 PM
sagarkulkarni19 requested review of this revision.Sep 26 2022, 2:57 PM

Thanks for the patch. This is going to be inconvenient, sorry, but: while implementing the specification in GCC, I noticed that the ZA functions weren't consistent about whether they had an _m suffix. svwrite (MOVA) had one, but the MOP intrinsics that you're implementing here didn't. Since SME2 does have some unpredicated instructions, it seems like it would be better to make the MOP intrinsics consistent with svwrite, with an _m suffix.

I've created https://github.com/ARM-software/acle/pull/218 for that change. Please let me know if it looks reasonable to you.

Thanks for the patch. This is going to be inconvenient, sorry, but: while implementing the specification in GCC, I noticed that the ZA functions weren't consistent about whether they had an _m suffix. svwrite (MOVA) had one, but the MOP intrinsics that you're implementing here didn't. Since SME2 does have some unpredicated instructions, it seems like it would be better to make the MOP intrinsics consistent with svwrite, with an _m suffix.

I've created https://github.com/ARM-software/acle/pull/218 for that change. Please let me know if it looks reasonable to you.

Thanks for letting me know. I can make the changes to MOP and ADD intrinsics and add a _m suffix.
Yes, this looks reasonable to me.