This patch introduces a new lowering layer between the Vector dialect
and the Arm SME extension. At the moment, the lowering from the Vector
dialect to SME looks like this:
- Vector --> SME LLVM IR intrinsics
This patch introduces custom SME ops, so the lowering will look like
this:
- Vector --> ArmSME dialect (custom Ops) --> SME LLVM IR intrinsics.
This is motivated by 2 considerations:
- Storing ZA to memory (e.g. vector.transfer_write) requires an scf.for loop over all rows of ZA. Similar logic will apply to "load to ZA from memory". This is a rather complex transformation and a custom Op seems justified.
- As discussed in [1], we need to prevent the LLVM type converter from having to convert types unsupported in LLVM, e.g. vector<[16]x[16]xi8>. A dedicated abstraction layer with custom Ops opens a path to some fine tuning (e.g. custom type converters) that will allow us to avoid this.
This patch introduces two SME Ops: TileStoreOp and ZeroOp. Note that
no new functionality is added - these Ops merely model what's already
supported. In particular, the following tile size is assumed (dimension
and element size are fixed):
- vector<[16]x[16]xi8>
The new lowering layer is introduced via a conversion pass between the
Vector and the SME dialects. You can use the -convert-vector-to-sme
flag to run it. The following function:
func.func @example(%arg0 : memref<?x?xi8>) { // (...) %cst = arith.constant dense<0> : vector<[16]x[16]xi8> vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8> return }
would be lowered to:
func.func @example(%arg0: memref<?x?xi8>) { // (...) %0 = arm_sme.zero : vector<[16]x[16]xi8> arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8> return }
Later, a mechanism will be introduced to guarantee that arm_sme.zero
and arm_sme.tile_store operate on the same virtual tile. For i8
elements this is not required as there is only one tile.
In order to lower the above output to LLVM, use
- -convert-vector-to-llvm="enable-arm-sme".