Page MenuHomePhabricator

[AArch64] Armv8.6-a Matrix Mult Assembly + Intrinsics

Authored by LukeGeeson on Apr 10 2020, 6:33 AM.



This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:

This patch includes:

  • Assembly support for AArch64 only (no SVE or Neon)
  • Intrinsics Support for AArch64 Armv8.6a Matrix Multiplication Instructions (No bfloat16 matrix multiplication)

No IR types or C Types are needed for this extension.

This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)

Based on work by:

  • Luke Geeson
  • Oliver Stannard
  • Luke Cheeseman

Diff Detail

Event Timeline

LukeGeeson created this revision.Apr 10 2020, 6:33 AM

Removed reliance on parent revision, harbormaster now builds with unit tests passing

kmclaughlin added inline comments.

Is it possible to use -sroa here as you did for the tests added in D77872? If so, I think this might make some of the _lane tests below a bit easier to follow.


The arrangement specifiers of the first two operands don't match for these tests, which is what the next set of tests below is checking for. It might be worth keeping these tests specific to just the index being out of range.


muct -> must :)

LukeGeeson marked 5 inline comments as done.
  • fixed typos
  • added sroa as mem2reg arg to reduce redundant mem accesses in tests, refactored test
  • addressed other comments
kmclaughlin accepted this revision.Apr 23 2020, 2:35 AM

Thanks for the updates, @LukeGeeson, LGTM

This revision is now accepted and ready to land.Apr 23 2020, 2:35 AM
This revision was automatically updated to reflect the committed changes.