This is an archive of the discontinued LLVM Phabricator instance.

[AArch32] Armv8.6-a Matrix Mult Assembly + Intrinsics
ClosedPublic

Authored by LukeGeeson on Apr 10 2020, 6:37 AM.

Details

Summary

This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

This patch includes:

  • Assembly support for AArch32
  • Intrinsics Support for AArch32 Neon Intrinsics for Matrix Multiplication

Note: these extensions are optional in the 8.6a architecture and so have
to be enabled by default

No additional IR types or C Types are needed for this extension.

This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)

Based on work by:

  • Luke Geeson
  • Oliver Stannard
  • Luke Cheeseman

Diff Detail

Event Timeline

LukeGeeson created this revision.Apr 10 2020, 6:37 AM

Harbourmaster now biulds, and unit tests pass, errors related to linting

miyuki added a subscriber: miyuki.Apr 15 2020, 10:02 AM
miyuki added inline comments.
clang/test/CodeGen/arm-v8.6a-neon-intrinsics.c
4

Can you try -sroa after -mem2reg? I think it should eliminate some more useless stores and loads.

llvm/lib/Target/ARM/ARMInstrNEON.td
4839

Why was this added?

  • added -sroa to arm-v8.6a-neon-intrinsics test, updated test to remove redundant memory accesses
  • made class vusdot for new vusdot instructions with hasNoSchedulinginfo for new instructions. Existing instructions retain scheduling via use of VDOT class
LukeGeeson marked 3 inline comments as done.Apr 16 2020, 6:08 AM
LukeGeeson added inline comments.
clang/test/CodeGen/arm-v8.6a-neon-intrinsics.c
4

added, test updated

llvm/lib/Target/ARM/ARMInstrNEON.td
4839

Added a vusdot class to handle this

miyuki accepted this revision.Apr 16 2020, 6:49 AM

LGTM, but please wait a couple more days before committing to see if anyone wants to chime in (I guess one week since the upload date should be enough).

This revision is now accepted and ready to land.Apr 16 2020, 6:49 AM
This comment was removed by miyuki.
This revision was automatically updated to reflect the committed changes.
LukeGeeson marked an inline comment as done.
dmgreen added inline comments.
llvm/lib/Target/ARM/ARMInstrNEON.td
4846

I don't think that hasNoSchedulingInfo is necessarily the best way to handle this. That flag is intended for instructions that will never be scheduled, like Pseudo instructions.

If you are running into "Complete Schedule" problems, they might need HasMatMulInt8 added to the list of unsupported features instead.

LukeGeeson marked an inline comment as done.Apr 25 2020, 6:23 AM
LukeGeeson added inline comments.
llvm/lib/Target/ARM/ARMInstrNEON.td
4846

Since there are no 8.6a cpus in llvm that support this extension, the default behaviour is to use Cortex-A57 scheduling - this also has no notion of 8.6a matmul.

This falls back to SchedMachineModel in llvm/lib/Target/ARM/ARMScheduleA57.td and in particular UnsupportedFeatures would be a candidate place to put unsupported features like matmul. I took issue with putting all new extensions here because there should be a separation of concerns between a particular scheduling model, and supporting new behaviour (which could go in a generic catch all location that might be slightly more informative ).

Did you have something in particular in mind?

dmgreen added inline comments.Apr 26 2020, 5:19 AM
llvm/lib/Target/ARM/ARMInstrNEON.td
4846

The ARMSchedule A57schedule is marked with CompleteModel=1. I believe it's the only one ARM-side that does that, but that's why it's running into problems. I'm surprised it hasn't run into the same problems in the past.

On the AArch64 side we tend to mark all Neon instructions with WriteV, so they at least get basic scheduling info. We don't tend to do that for ARM though.

Marking the instruction as hasNoSchedulingInfo will stop _any_ schedule from setting information on them, which whilst technically OK for now would be cause problems in the long run. We should either be adding a Sched[] Write (but I'm not sure what we would add), marking A57 as not complete (which seems like a hack) or adding unsupported features. Or adding scheduling info for these instructions to A57 I guess, but that doesn't sound right for a CPU that doesn't support them!

I can put a patch together to clean it up pretty easily.

Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2020, 5:19 AM