This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ArmSME] Introduce custom ops for SME
ClosedPublic

Authored by awarzynski on Jul 10 2023, 10:36 AM.

Details

Summary

This patch introduces a new lowering layer between the Vector dialect
and the Arm SME extension. At the moment, the lowering from the Vector
dialect to SME looks like this:

  • Vector --> SME LLVM IR intrinsics

This patch introduces custom SME ops, so the lowering will look like
this:

  • Vector --> ArmSME dialect (custom Ops) --> SME LLVM IR intrinsics.

This is motivated by 2 considerations:

  1. Storing ZA to memory (e.g. vector.transfer_write) requires an scf.for loop over all rows of ZA. Similar logic will apply to "load to ZA from memory". This is a rather complex transformation and a custom Op seems justified.
  2. As discussed in [1], we need to prevent the LLVM type converter from having to convert types unsupported in LLVM, e.g. vector<[16]x[16]xi8>. A dedicated abstraction layer with custom Ops opens a path to some fine tuning (e.g. custom type converters) that will allow us to avoid this.

This patch introduces two SME Ops: TileStoreOp and ZeroOp. Note that
no new functionality is added - these Ops merely model what's already
supported. In particular, the following tile size is assumed (dimension
and element size are fixed):

  • vector<[16]x[16]xi8>

The new lowering layer is introduced via a conversion pass between the
Vector and the SME dialects. You can use the -convert-vector-to-sme
flag to run it. The following function:

func.func @example(%arg0 : memref<?x?xi8>) {
  // (...)
  %cst = arith.constant dense<0> : vector<[16]x[16]xi8>
  vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8>
  return
}

would be lowered to:

func.func @example(%arg0: memref<?x?xi8>) {
  // (...)
  %0 = arm_sme.zero : vector<[16]x[16]xi8>
  arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>
  return
}

Later, a mechanism will be introduced to guarantee that arm_sme.zero
and arm_sme.tile_store operate on the same virtual tile. For i8
elements this is not required as there is only one tile.

In order to lower the above output to LLVM, use

  • -convert-vector-to-llvm="enable-arm-sme".

[1] https://github.com/openxla/iree/issues/14294

Diff Detail

Event Timeline

awarzynski created this revision.Jul 10 2023, 10:36 AM
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
awarzynski requested review of this revision.Jul 10 2023, 10:36 AM

This is mostly just moving some code around (while trying to address the issues listed in the summary). I'm sending this for early feedback to see what others think. I'll add more tests if this is the desired direction :)

@c-rhodes Would this play nicely with the updates that you are working on? (i.e. to support element sizes other than i8)

Hi Andrzej, thanks for looping me in, I just have one comment regarding the lowering of the ZeroOp.

Of course since you guys are working on the implementation, feel free to ignore my comments if you have something already planned for it.

-Frank

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
60

nit: Since this operation stores the entire ZA tile as opposed to more useful (virtual) tiles, it seems to be more appropriate to name this to save or spill or something similar? I think it makes sense to distinguish this version (using str) with other stores leveraging the st1* instructions

mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
76 ↗(On Diff #538730)

I feel like it makes a bit more sense to lower to ZeroOp from an arith::ConstantOp instead of bundling it with TransferWrite?

It may be a good idea to add a a verifier to make sure ZeroOps are consumed by only SME-compatible ops in the future?

tschuett added inline comments.
mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
79 ↗(On Diff #538730)

If I read this correctly, TileStoreOp does not depend on ZeroOp? Thus, I can rearrange the order of them?

Of course since you guys are working on the implementation, feel free to ignore my comments if you have something already planned for it.

Your feedback is greatly appreciated! I agree with your points, I am just wondering whether to address them in this patch. It's already quite large 🤔 .

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
60

Well, the idea is to iterate this design and to make this Op spill an SME virtual tile - once we can specify tile ID :) (this should happen soon)

I think that are right that it would be good to have a dedicated Op for spilling the whole array, but I will refrain from renaming just now.

mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
76 ↗(On Diff #538730)

I feel like it makes a bit more sense to lower to ZeroOp from an arith::ConstantOp instead of bundling it with TransferWrite?

We bundled arith::ConstantOp with TransferWrite because zero feels a bit pointless if transfer_write cannot be lowered to SME (e.g. because the destination is not a memref). Also, we wanted to demonstrate end-to-end example and leave the finer details for later (i.e. "now").

You are making a very good point though - the current approach won't make sense once we try to fill ZA with something other than 0. I'd rather do it in a separate patch (it will require a few other changes too).

79 ↗(On Diff #538730)

Yeah, good catch! I was trying to work around the type conversion issue by not having any inputs/outputs, but that wont' scale to other element types.

I will be updating this shortly.

Added input to TileStoreOp and output to ZeroOp. This means that the
following:

func.func @example(%arg0: memref<?x?xi8>) {
  // (...)
  arm_sme.zero
  arm_sme.tile_store %arg0 : memref<?x?xi8>
  return
}

becomes:

func.func @example(%arg0: memref<?x?xi8>) {
  // (...)
  %0 = arm_sme.zero : vector<[16]x[16]xi8>
  arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>
  return
}

With this update the type conversion issue becomes a bit trickier to address, but
at least the data flow is much easier to reason about.

Summary of other changes:

  • Rebase on top of D154941 (leverage the newly introduced Ops)
  • Remove LowerVectorOps.cpp (it's not needed anymore)
  • More comments, documentation and tests
  • Make the new Ops consume/return values (this is now possible with new ops from D154941)
WanderAway accepted this revision.Jul 11 2023, 8:21 AM

I am just wondering whether to address them in this patch. It's already quite large 🤔 .

Makes sense to me, I'm fine with a separate patch to address these issues.

This revision is now accepted and ready to land.Jul 11 2023, 8:21 AM

Thanks Andzej, I've left some comments and also noticed mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir is failing for me when I tried your patch, please could you check if it also fails for you?

mlir/include/mlir/Conversion/Passes.td
1080
1083

I think we should add arm since that's the full name of the dialect, would also apply to comments/filenames.

1090–1092
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
200

nit: american spellings (unfortunately 😢)

200–202
213

nit: indentation

235

nit: indentation

240
245–247

nit: indentation

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
99

nit: unrelated change

mlir/lib/Conversion/VectorToSME/CMakeLists.txt
13 ↗(On Diff #538730)

this failed to compile for me with undefined symbol errors, I think you're missing this library and MLIRVectorDialect in mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt?

mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
23 ↗(On Diff #539115)

I think we would want vector namespace to make it clear which ops are vector vs SME?

78–79 ↗(On Diff #539115)

I think we could use replaceOpWithNewOp here, apologies if I didnt use that

81 ↗(On Diff #539115)

I think rewriter will take care of removing this if it's dead after replacing the store?

mlir/lib/Conversion/VectorToSME/VectorToSMEPass.cpp
9–22 ↗(On Diff #539115)

quite a few of these aren't used

50–52 ↗(On Diff #539115)

are these necessary?

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
200

nit: american spelling

204

nit: indentation

209–210

nit: indentation

223–232

the cast op should create created after the intrinsic since it represents the tile loaded by the preceding intrinsic

243–244

nit: indentation

249–254

nit: indentation

awarzynski marked 5 inline comments as done.Jul 11 2023, 10:46 AM

Thanks for reviewing @c-rhodes !

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir is failing for me when I tried your patch,

That's because -convert-vector-to-arm-sme was missing, sorry about that :(

mlir/include/mlir/Conversion/Passes.td
1083

To be perfectly honest, I feel that shorter names are better and in general feel that Arm{NEON|SVE|SME} should be renamed as {NEON|SVE|SME}. But I agree that in the meantime we should prioritise consistency.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
200

AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to change though.

mlir/lib/Conversion/VectorToSME/CMakeLists.txt
13 ↗(On Diff #538730)

Thanks for catching this - I had shared libs turned off.

mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
23 ↗(On Diff #539115)

Agreed.

mlir/lib/Conversion/VectorToSME/VectorToSMEPass.cpp
9–22 ↗(On Diff #539115)

Thanks!

What do you use to find unused headers? I've updated my Vim LSP recently and haven't had a chance to restore that functionality yet :(

Addressing comments from Cullen

  • Fixed CMake
  • Fixed test
  • Fixed formatting

Rename "toSME" --> "toArmSME" (variable + file names)

Also removed more "unused" headers and simplified the new pass.

c-rhodes added inline comments.Jul 12 2023, 1:50 AM
mlir/include/mlir/Conversion/Passes.td
1083

To be perfectly honest, I feel that shorter names are better and in general feel that Arm{NEON|SVE|SME} should be renamed as {NEON|SVE|SME}. But I agree that in the meantime we should prioritise consistency.

There's a good reason for keeping Arm in the name, I think NEON has been around long enough for people to recognise it as an Arm technology, but SVE/SME have generic names and we have to be cognisant most people probably don't know what they are, 3 extra characters to add clarity seems like a small price to pay to me.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
200

AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to change though.

Ah ok, coming from LLVM/Clang I thought American spellings were standard, apologies if that's not the case.

236

nit: indentation

237–247

nit: indentation

mlir/lib/Conversion/VectorToArmSME/CMakeLists.txt
15

I think this can be removed?

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
2

filename needs updating here

11

nit: empty line

15–16

unused?

mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp
11

nit: empty line

mlir/lib/Conversion/VectorToSME/VectorToSMEPass.cpp
9–22 ↗(On Diff #539115)

Thanks!

What do you use to find unused headers? I've updated my Vim LSP recently and haven't had a chance to restore that functionality yet :(

I don't use a tool, I checked out your patch looked at the code and removed ones I couldnt see were used then verified by compiling.

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
223–232

Please could you look at this again, the cast is still created before the intrinsic.

224

getVectorType?

262–264

the cast can be removed

awarzynski marked 5 inline comments as done.Jul 12 2023, 2:14 AM

Thanks Cullen! I will be sending an update shortly.

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
11

AFAIK, there are no code style rules for this sort of things apart from:

The Main Module Header file applies to .cpp files which implement an interface defined by a .h file. This #include should always be included first regardless of where it lives on the file system.

And keeping an empty line between the main module include and other header files is quite common in MLIR:

https://github.com/llvm/llvm-project/blob/60c9d2993bbf1594e89e1e6f72e1472eb1aeb8ef/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp#L13-L14

Incorporate the latests suggestions from Cullen

Sorry for the delay. Some comments, most of them nits.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
199

What are the side effects of this op?

200

ZA -> ZA tile/tile register/register?

216–218

Curious... getType or getResultType (or similar ones, auto-generated) should return a VectorType if nxnxv... are defined as vectors. Isn't that the case? Do we need this method for some other reason then?

223

This one should at least have memory side effects

234

Would it make sense to align the operand order with the rest of store ops in MLIR? I.e., value-to-store, dst-memref [indices] : vector-type, memref-type?

mlir/lib/Conversion/CMakeLists.txt
55

sort

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
38

missing operand and types?

39

nit: TransferWriteToArmSMELowering?

58

if memRefType is not used beyond the condition you should use isa instead of dyn_cast

mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp
28–30

We should move the dependencies to the .td file. There is a way to have them defined there and have the code autogenerated.

mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt
14

sort

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
210

add getTileId? Otherwise, it's not clear where %1 is coming from

223–232

Yes, you should be able to do rewriter.replaceOpWithNewOp(zero, ....

233

Ok, I see what you are trying to do here... and can't think of a better way. This is more like propagating information (getTileId) across different op converters but through the IR. I think I tried to do something similar by introducing a state in the converters but I barely remember. I'm ok with this.

239

Something important here: we introduce the SME lowering layer to explicitly model what is needed for SME and make the conversion to LLVM easier. However, here we are materializing a loop. I'm wondering why that loop is not generated when we move from Vector to the SME dialect and then the conversion to LLVM is mostly a 1:1 translation to the intrinsics.

c-rhodes added inline comments.Jul 13 2023, 1:14 AM
mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
11

AFAIK, there are no code style rules for this sort of things apart from:

The Main Module Header file applies to .cpp files which implement an interface defined by a .h file. This #include should always be included first regardless of where it lives on the file system.

And keeping an empty line between the main module include and other header files is quite common in MLIR:

https://github.com/llvm/llvm-project/blob/60c9d2993bbf1594e89e1e6f72e1472eb1aeb8ef/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp#L13-L14

Hadn't noticed that, thanks for pointing that out

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
239

Something important here: we introduce the SME lowering layer to explicitly model what is needed for SME and make the conversion to LLVM easier. However, here we are materializing a loop. I'm wondering why that loop is not generated when we move from Vector to the SME dialect and then the conversion to LLVM is mostly a 1:1 translation to the intrinsics.

I've also been thinking about this, the load/stores in SME operate on ZA array vectors or tile slices, which are 1-d scalable vectors of SVL bits, rather than an entire tile, hence the loop materialization. Perhaps if we had custom ops that deal with tile vectors the loop could be emitted when going from Vector -> SME and these would later map 1-1 with LLVM intrinsics. We'll consider what we can do here, thanks for raising this.

awarzynski marked 8 inline comments as done.Jul 13 2023, 3:42 AM

Thanks for reviewing!

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
216–218

No, it returns an abstract Type that you then have to cast to Vector. At least that's what I'm seeing 🤔 .

223

See [MemWrite] on L238:

let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,
                 Variadic<Index>:$indices,
                 nxnxv16i8:$valueToStore);
234

Good shout! I will align this with Vector_StoreOp.

mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp
28–30

Annoyingly, that's already there :) Good catch, thanks!

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
223–232

Apologies @c-rhodes , I missed this comment. Will be updating shortly.

the cast op should create created after the intrinsic since it represents the tile loaded by the preceding intrinsic

Do you think that the order will matter in practice? Otherwise somebody could just rewrite your suggestion as:

the cast op should create created after the intrinsic since it represents the tile loaded by the _following_ intrinsic

IIUC, the order does not matter, but might be missing something? Regardless, we should definitely make sure that we are consistent and I am happy with "after" (i.e. your suggestion).

239

Good points, thanks! Now that you have raised this I see that this abstraction should be re-fined.

Is it OK to iterate in future patches though? There's a few other patches that depend on one another, so I would land this as is and refactor separately. My main goal is to get the overall scaffolding in first (i.e. the "Vector to SME" pass). WDYT?

Incorporate suggestions from Diego, thanks!

c-rhodes added inline comments.Jul 13 2023, 3:47 AM
mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
39

I noticed other rewrites are in an empty namespace, do we need one here?

Add an anonymous namespace

awarzynski added inline comments.Jul 13 2023, 4:48 AM
mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
39

Done :)

thanks for the updates Andrzej this is really taking shape, just a few more comments :)

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
237

example needs updating now the operand order has changed

242

nit: move to above line or indent to make it clear it applies to the memref

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
165–202

move this to bottom alongside populateArmSMELegalizeForLLVMExportPatterns?

227–228

this should be created before the zero, and we should add a note that get_tile_id and zero aren't chain together yet

247–252

nit: the variable names could be improved, %3 -> %vscale for example

mlir/test/Dialect/ArmSME/roundtrip.mlir
176

hasn't the operand order been changed so this comes first? Surprised this test passed

mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir
39

i think we should keep a CHECK-NOT?

awarzynski marked 2 inline comments as done.

Update the assembly format for TileStoreOp

Thanks Cullen - that's a very thorough and much appreciated review! I've just updated the patch (before sending my replies), so my comments will be a bit out of sync, sorry.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
237

Argh, assembly format needs updating too. Please double check. I am trying to align with VectorStoreOp, but this looks off: assembly format for VectorStoreOp.

mlir/test/Dialect/ArmSME/roundtrip.mlir
176

I've not changed the assembly format yet ;-)

mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir
39

Removed by accident, ta!

Matt added a subscriber: Matt.Jul 13 2023, 2:37 PM
c-rhodes accepted this revision.Jul 14 2023, 2:01 AM

Just a couple minor comments but otherwise LGTM! Cheers

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
239

Good points, thanks! Now that you have raised this I see that this abstraction should be re-fined.

Is it OK to iterate in future patches though? There's a few other patches that depend on one another, so I would land this as is and refactor separately. My main goal is to get the overall scaffolding in first (i.e. the "Vector to SME" pass). WDYT?

Yeah that can be done separately.

mlir/test/Dialect/ArmSME/roundtrip.mlir
173

nit: space before ":" for consistency

mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir
9–10

the order here is important, should we be using CHECK-DAG?

awarzynski added inline comments.Jul 17 2023, 7:51 AM
mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir
9–10

These will always be ordered correctly as there is a dependency expressed via TILE_ID:

// CHECK-DAG:  %[[TILE_ID:.*]] = arm_sme.get_tile_id : i8
// CHECK-DAG:  %[[CAST_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i8 to vector<[16]x[16]xi8>

So I think that it should be OK.

Update the assembly format for arm_sme.tile_store to match vector.store:

arm_sme.tile_store %tile, %dest[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>

rather than:

arm_sme.tile_store %tile, %dest[%c0, %c0] : vector<[16]x[16]xi8>, memref<?x?xi8>
dcaballe accepted this revision.Jul 17 2023, 9:41 AM

Thanks for addressing the comments! LGTM!

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
239

It sounds good to me to do this separately but this is a big abstraction change so hopefully we can do it sooner than later. If you think the non-loop abstraction is also useful, we could also have two level of abtractions within the same dialect, where we go first to the non-loop one and then materialize the loop at some point within the SME dialect. The Vector dialect is a good example of this.

This revision was landed with ongoing or failed builds.Jul 18 2023, 1:07 AM
This revision was automatically updated to reflect the committed changes.
c-rhodes added inline comments.Jul 27 2023, 11:09 AM
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
239

It sounds good to me to do this separately but this is a big abstraction change so hopefully we can do it sooner than later. If you think the non-loop abstraction is also useful, we could also have two level of abtractions within the same dialect, where we go first to the non-loop one and then materialize the loop at some point within the SME dialect. The Vector dialect is a good example of this.

I've shared an update on Discourse: https://discourse.llvm.org/t/loop-materialization-in-armsme/72354

And a solution in D156467

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp