Here is the initial implementation of the ArmSME dialect.
The Scalable Matrix Extension (SME) is an extension to SVE (scalable vector extension) for aarch64, and focuses on outer product instructions to accelerate matrix multiplies by utilizing a 2D tile register (ZA), which is split into multiple smaller square tiles (ZA[0-3]s, ZA[0-7]d).
More information on the architecture itself can be found [[ https://developer.arm.com/documentation/ddi0616/aa | here]].
Currently this patch defines most of the instructions defined by the extension, but lowering only supports non-widening (aka. fp32 and fp64) versions of MOPA/MOPS op, in addition to the ZERO op.
The implementation of this dialect is heavily influenced by the existing ArmSVE dialect.
The plan is to somehow connect to the vector dialect either through the `OuterProductOp` or by introducing a `MaskedOuterProductOp`. Additionally accessing vectors from within the SME tile register should be implemented through the new load/store/move instructions.