This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
This patch includes:
- Assembly support for AArch64 only (no SVE or Neon)
- Intrinsics Support for AArch64 Armv8.6a Matrix Multiplication Instructions (No bfloat16 matrix multiplication)
No IR types or C Types are needed for this extension.
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Is it possible to use -sroa here as you did for the tests added in D77872? If so, I think this might make some of the _lane tests below a bit easier to follow.