This patch adds support for BFloat Matrix Multiplication Intrinsics
and Code Generation from __bf16 to AArch32. This includes IR intrinsics. Tests are
provided as needed.
This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:
The bfloat type and its properties are specified in the Arm
Architecture
Reference Manual:
The following people contributed to this patch:
- Luke Geeson
- Momchil Velikov
- Mikhail Maltsev
- Luke Cheeseman
- Simon Tatham
Would it be sufficient to run through opt -mem2reg -instcombine instead of the whole -O2 pipeline?