This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Conversion/
-
Passes.h
1/5
Passes.td
-
VectorToSME/
-
VectorToSME.h
-
Dialect/ArmSME/IR/
-
ArmSME/
-
IR/
-
ArmSME.h
12/23
ArmSME.td
-
lib/
-
Conversion/
1/1
CMakeLists.txt
-
VectorToLLVM/
1/1
ConvertVectorToLLVMPass.cpp
-
VectorToSME/
1/2
CMakeLists.txt
5/8
VectorToSME.cpp
1/4
VectorToSMEPass.cpp
-
Dialect/ArmSME/
-
ArmSME/
-
IR/
1/1
CMakeLists.txt
-
Transforms/
-
CMakeLists.txt
5/22
LegalizeForLLVMExport.cpp
-
LowerVectorOps.cpp
-
test/
-
Dialect/ArmSME/
-
ArmSME/
1/3
roundtrip.mlir
1/2
vector-ops-to-llvm.mlir
1/2
vector-ops-to-sme.mlir
-
vector-ops.mlir
-
Integration/Dialect/Vector/CPU/ArmSME/
-
Dialect/
-
Vector/
-
CPU/
-
ArmSME/
-
vector-ops.mlir

Differential D154867

[mlir][ArmSME] Introduce custom ops for SME
ClosedPublic

Authored by awarzynski on Jul 10 2023, 10:36 AM.

Download Raw Diff

Details

Reviewers

dcaballe
c-rhodes
WanderAway
aartbik
ftynse
nicolasvasilache

Commits

rG447bb5bee402: [mlir][ArmSME] Introduce new lowering layer (Vector -> ArmSME)

Summary

This patch introduces a new lowering layer between the Vector dialect
and the Arm SME extension. At the moment, the lowering from the Vector
dialect to SME looks like this:

Vector --> SME LLVM IR intrinsics

This patch introduces custom SME ops, so the lowering will look like
this:

Vector --> ArmSME dialect (custom Ops) --> SME LLVM IR intrinsics.

This is motivated by 2 considerations:

Storing ZA to memory (e.g. vector.transfer_write) requires an scf.for loop over all rows of ZA. Similar logic will apply to "load to ZA from memory". This is a rather complex transformation and a custom Op seems justified.
As discussed in [1], we need to prevent the LLVM type converter from having to convert types unsupported in LLVM, e.g. vector<[16]x[16]xi8>. A dedicated abstraction layer with custom Ops opens a path to some fine tuning (e.g. custom type converters) that will allow us to avoid this.

This patch introduces two SME Ops: TileStoreOp and ZeroOp. Note that
no new functionality is added - these Ops merely model what's already
supported. In particular, the following tile size is assumed (dimension
and element size are fixed):

vector<[16]x[16]xi8>

The new lowering layer is introduced via a conversion pass between the
Vector and the SME dialects. You can use the -convert-vector-to-sme
flag to run it. The following function:

func.func @example(%arg0 : memref<?x?xi8>) {
  // (...)
  %cst = arith.constant dense<0> : vector<[16]x[16]xi8>
  vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8>
  return
}

would be lowered to:

func.func @example(%arg0: memref<?x?xi8>) {
  // (...)
  %0 = arm_sme.zero : vector<[16]x[16]xi8>
  arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>
  return
}

Later, a mechanism will be introduced to guarantee that arm_sme.zero
and arm_sme.tile_store operate on the same virtual tile. For i8
elements this is not required as there is only one tile.

In order to lower the above output to LLVM, use

-convert-vector-to-llvm="enable-arm-sme".

[1] https://github.com/openxla/iree/issues/14294

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

awarzynski created this revision.Jul 10 2023, 10:36 AM

Herald added a reviewer: aartbik. · View Herald TranscriptJul 10 2023, 10:36 AM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: gysit, Dinistro, bviyer and 24 others. · View Herald Transcript

awarzynski requested review of this revision.Jul 10 2023, 10:36 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJul 10 2023, 10:36 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

This is mostly just moving some code around (while trying to address the issues listed in the summary). I'm sending this for early feedback to see what others think. I'll add more tests if this is the desired direction :)

@c-rhodes Would this play nicely with the updates that you are working on? (i.e. to support element sizes other than i8)

Hi Andrzej, thanks for looping me in, I just have one comment regarding the lowering of the ZeroOp.

Of course since you guys are working on the implementation, feel free to ignore my comments if you have something already planned for it.

-Frank

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
60	nit: Since this operation stores the entire ZA tile as opposed to more useful (virtual) tiles, it seems to be more appropriate to name this to `save` or `spill` or something similar? I think it makes sense to distinguish this version (using `str`) with other stores leveraging the `st1*` instructions
mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
77	I feel like it makes a bit more sense to lower to `ZeroOp` from an `arith::ConstantOp` instead of bundling it with `TransferWrite`? It may be a good idea to add a a verifier to make sure `ZeroOp`s are consumed by only SME-compatible ops in the future?

tschuett added a subscriber: tschuett.Jul 10 2023, 11:58 AM

tschuett added inline comments.

mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
80	If I read this correctly, TileStoreOp does not depend on ZeroOp? Thus, I can rearrange the order of them?

Harbormaster completed remote builds in B244209: Diff 538730.Jul 10 2023, 12:07 PM

In D154867#4486211, @WanderAway wrote:

Of course since you guys are working on the implementation, feel free to ignore my comments if you have something already planned for it.

Your feedback is greatly appreciated! I agree with your points, I am just wondering whether to address them in this patch. It's already quite large 🤔 .

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
60	Well, the idea is to iterate this design and to make this Op spill an SME virtual tile - once we can specify tile ID :) (this should happen soon) I think that are right that it would be good to have a dedicated Op for spilling the whole array, but I will refrain from renaming just now.
mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
77	I feel like it makes a bit more sense to lower to ZeroOp from an arith::ConstantOp instead of bundling it with TransferWrite? We bundled arith::ConstantOp with TransferWrite because `zero` feels a bit pointless if `transfer_write` cannot be lowered to SME (e.g. because the destination is not a `memref`). Also, we wanted to demonstrate end-to-end example and leave the finer details for later (i.e. "now"). You are making a very good point though - the current approach won't make sense once we try to fill `ZA` with something other than 0. I'd rather do it in a separate patch (it will require a few other changes too).
80	Yeah, good catch! I was trying to work around the type conversion issue by not having any inputs/outputs, but that wont' scale to other element types. I will be updating this shortly.

Added input to TileStoreOp and output to ZeroOp. This means that the
following:

func.func @example(%arg0: memref<?x?xi8>) {
  // (...)
  arm_sme.zero
  arm_sme.tile_store %arg0 : memref<?x?xi8>
  return
}

becomes:

func.func @example(%arg0: memref<?x?xi8>) {
  // (...)
  %0 = arm_sme.zero : vector<[16]x[16]xi8>
  arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>
  return
}

With this update the type conversion issue becomes a bit trickier to address, but
at least the data flow is much easier to reason about.

Summary of other changes:

Rebase on top of D154941 (leverage the newly introduced Ops)
Remove LowerVectorOps.cpp (it's not needed anymore)
More comments, documentation and tests
Make the new Ops consume/return values (this is now possible with new ops from D154941)

Harbormaster completed remote builds in B244486: Diff 539115.Jul 11 2023, 8:14 AM

awarzynski edited the summary of this revision. (Show Details)Jul 11 2023, 8:15 AM

awarzynski added a parent revision: D154941: [mlir][ArmSME] Add custom get_tile_id and cast ops.

I am just wondering whether to address them in this patch. It's already quite large 🤔 .

Makes sense to me, I'm fine with a separate patch to address these issues.

This revision is now accepted and ready to land.Jul 11 2023, 8:21 AM

awarzynski added a child revision: D154302: [mlir][nfc] Clarify the limitation on scalable vectors.Jul 11 2023, 8:55 AM

awarzynski mentioned this in D154302: [mlir][nfc] Clarify the limitation on scalable vectors.Jul 11 2023, 8:56 AM

Thanks Andzej, I've left some comments and also noticed mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir is failing for me when I tried your patch, please could you check if it also fails for you?

mlir/include/mlir/Conversion/Passes.td
1080
1083	I think we should add arm since that's the full name of the dialect, would also apply to comments/filenames.
1090–1092
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
201	nit: american spellings (unfortunately 😢)
201–203
214	nit: indentation
236	nit: indentation
241
246–248	nit: indentation
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
99	nit: unrelated change
mlir/lib/Conversion/VectorToSME/CMakeLists.txt
14	this failed to compile for me with undefined symbol errors, I think you're missing this library and `MLIRVectorDialect` in `mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt`?
mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
24	I think we would want vector namespace to make it clear which ops are vector vs SME?
79–80	I think we could use replaceOpWithNewOp here, apologies if I didnt use that
82	I think rewriter will take care of removing this if it's dead after replacing the store?
mlir/lib/Conversion/VectorToSME/VectorToSMEPass.cpp
10–23	quite a few of these aren't used
51–53	are these necessary?
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
101	nit: american spelling
105	nit: indentation
110–111	nit: indentation
124–133	the cast op should create created after the intrinsic since it represents the tile loaded by the preceding intrinsic
144–145	nit: indentation
150–155	nit: indentation

Thanks for reviewing @c-rhodes !

In D154867#4489909, @c-rhodes wrote:

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir is failing for me when I tried your patch,

That's because -convert-vector-to-arm-sme was missing, sorry about that :(

mlir/include/mlir/Conversion/Passes.td
1083	To be perfectly honest, I feel that shorter names are better and in general feel that `Arm{NEON\|SVE\|SME}` should be renamed as `{NEON\|SVE\|SME}`. But I agree that in the meantime we should prioritise consistency.
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
201	AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to change though.
mlir/lib/Conversion/VectorToSME/CMakeLists.txt
14	Thanks for catching this - I had shared libs turned off.
mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
24	Agreed.
mlir/lib/Conversion/VectorToSME/VectorToSMEPass.cpp
10–23	Thanks! What do you use to find unused headers? I've updated my Vim LSP recently and haven't had a chance to restore that functionality yet :(

awarzynski updated this revision to Diff 539190.Jul 11 2023, 10:46 AM

Addressing comments from Cullen

Fixed CMake
Fixed test
Fixed formatting

Herald added subscribers: ThomasRaoux, jsetoain. · View Herald TranscriptJul 11 2023, 10:46 AM

Harbormaster completed remote builds in B244536: Diff 539190.Jul 11 2023, 3:37 PM

Rename "toSME" --> "toArmSME" (variable + file names)

Also removed more "unused" headers and simplified the new pass.

c-rhodes added inline comments.Jul 12 2023, 1:50 AM

mlir/include/mlir/Conversion/Passes.td
1083	To be perfectly honest, I feel that shorter names are better and in general feel that `Arm{NEON\|SVE\|SME}` should be renamed as `{NEON\|SVE\|SME}`. But I agree that in the meantime we should prioritise consistency. There's a good reason for keeping Arm in the name, I think NEON has been around long enough for people to recognise it as an Arm technology, but SVE/SME have generic names and we have to be cognisant most people probably don't know what they are, 3 extra characters to add clarity seems like a small price to pay to me.
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
201	AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to change though. Ah ok, coming from LLVM/Clang I thought American spellings were standard, apologies if that's not the case.
237	nit: indentation
238–248	nit: indentation
mlir/lib/Conversion/VectorToArmSME/CMakeLists.txt
14 ↗	(On Diff #539411)	I think this can be removed?
mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
1 ↗	(On Diff #539411)	filename needs updating here
10 ↗	(On Diff #539411)	nit: empty line
14–15 ↗	(On Diff #539411)	unused?
mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp
10 ↗	(On Diff #539411)	nit: empty line
mlir/lib/Conversion/VectorToSME/VectorToSMEPass.cpp
10–23	Thanks! What do you use to find unused headers? I've updated my Vim LSP recently and haven't had a chance to restore that functionality yet :( I don't use a tool, I checked out your patch looked at the code and removed ones I couldnt see were used then verified by compiling.
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
124–133	Please could you look at this again, the cast is still created before the intrinsic.
125	getVectorType?
163–165	the cast can be removed

awarzynski mentioned this in D154941: [mlir][ArmSME] Add custom get_tile_id and cast ops.Jul 12 2023, 1:51 AM

Thanks Cullen! I will be sending an update shortly.

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
10 ↗	(On Diff #539411)	AFAIK, there are no code style rules for this sort of things apart from: The Main Module Header file applies to .cpp files which implement an interface defined by a .h file. This #include should always be included first regardless of where it lives on the file system. And keeping an empty line between the main module include and other header files is quite common in MLIR: https://github.com/llvm/llvm-project/blob/60c9d2993bbf1594e89e1e6f72e1472eb1aeb8ef/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp#L13-L14

Incorporate the latests suggestions from Cullen

Harbormaster completed remote builds in B244702: Diff 539433.Jul 12 2023, 3:58 AM

Sorry for the delay. Some comments, most of them nits.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
200	What are the side effects of this op?
201	ZA -> ZA tile/tile register/register?
217–219	Curious... `getType` or `getResultType` (or similar ones, auto-generated) should return a `VectorType` if `nxnxv...` are defined as vectors. Isn't that the case? Do we need this method for some other reason then?
224	This one should at least have memory side effects
235	Would it make sense to align the operand order with the rest of store ops in MLIR? I.e., `value-to-store, dst-memref [indices] : vector-type, memref-type`?
mlir/lib/Conversion/CMakeLists.txt
54	sort
mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
37 ↗	(On Diff #539433)	missing operand and types?
38 ↗	(On Diff #539433)	nit: TransferWriteToArmSMELowering?
57 ↗	(On Diff #539433)	if `memRefType` is not used beyond the condition you should use `isa` instead of `dyn_cast`
mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp
27–29 ↗	(On Diff #539433)	We should move the dependencies to the .td file. There is a way to have them defined there and have the code autogenerated.
mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt
14	sort
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
111	add getTileId? Otherwise, it's not clear where %1 is coming from
124–133	Yes, you should be able to do `rewriter.replaceOpWithNewOp(zero, ...`.
134	Ok, I see what you are trying to do here... and can't think of a better way. This is more like propagating information (getTileId) across different op converters but through the IR. I think I tried to do something similar by introducing a state in the converters but I barely remember. I'm ok with this.
140	Something important here: we introduce the SME lowering layer to explicitly model what is needed for SME and make the conversion to LLVM easier. However, here we are materializing a loop. I'm wondering why that loop is not generated when we move from Vector to the SME dialect and then the conversion to LLVM is mostly a 1:1 translation to the intrinsics.

c-rhodes added inline comments.Jul 13 2023, 1:14 AM

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
10 ↗	(On Diff #539411)	AFAIK, there are no code style rules for this sort of things apart from: The Main Module Header file applies to .cpp files which implement an interface defined by a .h file. This #include should always be included first regardless of where it lives on the file system. And keeping an empty line between the main module include and other header files is quite common in MLIR: https://github.com/llvm/llvm-project/blob/60c9d2993bbf1594e89e1e6f72e1472eb1aeb8ef/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp#L13-L14 Hadn't noticed that, thanks for pointing that out
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
140	Something important here: we introduce the SME lowering layer to explicitly model what is needed for SME and make the conversion to LLVM easier. However, here we are materializing a loop. I'm wondering why that loop is not generated when we move from Vector to the SME dialect and then the conversion to LLVM is mostly a 1:1 translation to the intrinsics. I've also been thinking about this, the load/stores in SME operate on ZA array vectors or tile slices, which are 1-d scalable vectors of SVL bits, rather than an entire tile, hence the loop materialization. Perhaps if we had custom ops that deal with tile vectors the loop could be emitted when going from Vector -> SME and these would later map 1-1 with LLVM intrinsics. We'll consider what we can do here, thanks for raising this.

Thanks for reviewing!

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
217–219	No, it returns an abstract `Type` that you then have to cast to Vector. At least that's what I'm seeing 🤔 .
224	See `[MemWrite]` on L238: let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base, Variadic<Index>:$indices, nxnxv16i8:$valueToStore);
235	Good shout! I will align this with `Vector_StoreOp`.
mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp
27–29 ↗	(On Diff #539433)	Annoyingly, that's already there :) Good catch, thanks!
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
124–133	Apologies @c-rhodes , I missed this comment. Will be updating shortly. the cast op should create created after the intrinsic since it represents the tile loaded by the preceding intrinsic Do you think that the order will matter in practice? Otherwise somebody could just rewrite your suggestion as: the cast op should create created after the intrinsic since it represents the tile loaded by the _following_ intrinsic IIUC, the order does not matter, but might be missing something? Regardless, we should definitely make sure that we are consistent and I am happy with "after" (i.e. your suggestion).
140	Good points, thanks! Now that you have raised this I see that this abstraction should be re-fined. Is it OK to iterate in future patches though? There's a few other patches that depend on one another, so I would land this as is and refactor separately. My main goal is to get the overall scaffolding in first (i.e. the "Vector to SME" pass). WDYT?

Incorporate suggestions from Diego, thanks!

Harbormaster completed remote builds in B245043: Diff 539935.Jul 13 2023, 3:46 AM

c-rhodes added inline comments.Jul 13 2023, 3:47 AM

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
38 ↗	(On Diff #539433)	I noticed other rewrites are in an empty namespace, do we need one here?

Add an anonymous namespace

Harbormaster completed remote builds in B245065: Diff 539959.Jul 13 2023, 4:48 AM

awarzynski added inline comments.Jul 13 2023, 4:48 AM

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
38 ↗	(On Diff #539433)	Done :)

thanks for the updates Andrzej this is really taking shape, just a few more comments :)

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
238	example needs updating now the operand order has changed
243	nit: move to above line or indent to make it clear it applies to the memref
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
66–200	move this to bottom alongside `populateArmSMELegalizeForLLVMExportPatterns`?
128–129	this should be created before the zero, and we should add a note that get_tile_id and zero aren't chain together yet
148–153	nit: the variable names could be improved, %3 -> %vscale for example
mlir/test/Dialect/ArmSME/roundtrip.mlir
32	hasn't the operand order been changed so this comes first? Surprised this test passed
mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir
39	i think we should keep a CHECK-NOT?

Update the assembly format for TileStoreOp

Harbormaster completed remote builds in B245108: Diff 540016.Jul 13 2023, 7:14 AM

Thanks Cullen - that's a very thorough and much appreciated review! I've just updated the patch (before sending my replies), so my comments will be a bit out of sync, sorry.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
238	Argh, assembly format needs updating too. Please double check. I am trying to align with `VectorStoreOp`, but this looks off: assembly format for VectorStoreOp.
mlir/test/Dialect/ArmSME/roundtrip.mlir
32	I've not changed the assembly format yet ;-)
mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir
39	Removed by accident, ta!

Matt added a subscriber: Matt.Jul 13 2023, 2:37 PM

Just a couple minor comments but otherwise LGTM! Cheers

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
140	Good points, thanks! Now that you have raised this I see that this abstraction should be re-fined. Is it OK to iterate in future patches though? There's a few other patches that depend on one another, so I would land this as is and refactor separately. My main goal is to get the overall scaffolding in first (i.e. the "Vector to SME" pass). WDYT? Yeah that can be done separately.
mlir/test/Dialect/ArmSME/roundtrip.mlir
29	nit: space before ":" for consistency
mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir
9–10	the order here is important, should we be using CHECK-DAG?

c-rhodes added a child revision: D155306: [mlir][ArmSME] Add tile load op and extend tile store tile size support.Jul 14 2023, 9:01 AM

awarzynski added a child revision: D155365: [mlir][ArmSME] Introduce custom TypeConverter for ArmSME.Jul 15 2023, 4:20 AM

awarzynski removed a child revision: D154302: [mlir][nfc] Clarify the limitation on scalable vectors.Jul 15 2023, 4:24 AM

awarzynski added inline comments.Jul 17 2023, 7:51 AM

mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir
9–10	These will always be ordered correctly as there is a dependency expressed via `TILE_ID`: // CHECK-DAG: %[[TILE_ID:.]] = arm_sme.get_tile_id : i8 // CHECK-DAG: %[[CAST_TO_VECTOR:.]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i8 to vector<[16]x[16]xi8> So I think that it should be OK.

Update the assembly format for arm_sme.tile_store to match vector.store:

arm_sme.tile_store %tile, %dest[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>

rather than:

arm_sme.tile_store %tile, %dest[%c0, %c0] : vector<[16]x[16]xi8>, memref<?x?xi8>

Harbormaster completed remote builds in B245858: Diff 541035.Jul 17 2023, 7:54 AM

Thanks for addressing the comments! LGTM!

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
140	It sounds good to me to do this separately but this is a big abstraction change so hopefully we can do it sooner than later. If you think the non-loop abstraction is also useful, we could also have two level of abtractions within the same dialect, where we go first to the non-loop one and then materialize the loop at some point within the SME dialect. The Vector dialect is a good example of this.

This revision was landed with ongoing or failed builds.Jul 18 2023, 1:07 AM

Closed by commit rG447bb5bee402: [mlir][ArmSME] Introduce new lowering layer (Vector -> ArmSME) (authored by awarzynski). · Explain Why

This revision was automatically updated to reflect the committed changes.

awarzynski added a commit: rG447bb5bee402: [mlir][ArmSME] Introduce new lowering layer (Vector -> ArmSME).

c-rhodes added inline comments.Jul 27 2023, 11:09 AM

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
140	It sounds good to me to do this separately but this is a big abstraction change so hopefully we can do it sooner than later. If you think the non-loop abstraction is also useful, we could also have two level of abtractions within the same dialect, where we go first to the non-loop one and then materialize the loop at some point within the SME dialect. The Vector dialect is a good example of this. I've shared an update on Discourse: https://discourse.llvm.org/t/loop-materialization-in-armsme/72354 And a solution in D156467

GitHub <noreply@github.com> mentioned this in rG0e06694235bf: [mlir][ArmSME][NFC] Remove arm_sme::populateVectorTransferLoweringPatterns decl….Thu, Dec 14, 2:51 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

Passes.h

1 line

Passes.td

14 lines

VectorToSME/

VectorToSME.h

25 lines

Dialect/

ArmSME/

IR/

ArmSME.h

1 line

ArmSME.td

55 lines

lib/

Conversion/

CMakeLists.txt

1 line

VectorToLLVM/

ConvertVectorToLLVMPass.cpp

1 line

VectorToSME/

CMakeLists.txt

15 lines

VectorToSME.cpp

86 lines

VectorToSMEPass.cpp

53 lines

Dialect/

ArmSME/

IR/

CMakeLists.txt

1 line

Transforms/

CMakeLists.txt

1 line

LegalizeForLLVMExport.cpp

113 lines

LowerVectorOps.cpp

test/

Dialect/

ArmSME/

roundtrip.mlir

17 lines

vector-ops-to-llvm.mlir

29 lines

	vector-ops-to-sme.mlir
	vector-ops.mlir

43 lines

vector-ops.mlir

Integration/

Dialect/

Vector/

CPU/

ArmSME/

vector-ops.mlir

2 lines

Diff 539190

mlir/include/mlir/Conversion/Passes.h

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	#include "mlir/Conversion/TensorToSPIRV/TensorToSPIRVPass.h"			#include "mlir/Conversion/TensorToSPIRV/TensorToSPIRVPass.h"
	#include "mlir/Conversion/TosaToArith/TosaToArith.h"			#include "mlir/Conversion/TosaToArith/TosaToArith.h"
	#include "mlir/Conversion/TosaToLinalg/TosaToLinalg.h"			#include "mlir/Conversion/TosaToLinalg/TosaToLinalg.h"
	#include "mlir/Conversion/TosaToSCF/TosaToSCF.h"			#include "mlir/Conversion/TosaToSCF/TosaToSCF.h"
	#include "mlir/Conversion/TosaToTensor/TosaToTensor.h"			#include "mlir/Conversion/TosaToTensor/TosaToTensor.h"
	#include "mlir/Conversion/VectorToGPU/VectorToGPU.h"			#include "mlir/Conversion/VectorToGPU/VectorToGPU.h"
	#include "mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h"			#include "mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h"
	#include "mlir/Conversion/VectorToSCF/VectorToSCF.h"			#include "mlir/Conversion/VectorToSCF/VectorToSCF.h"
				#include "mlir/Conversion/VectorToSME/VectorToSME.h"
	#include "mlir/Conversion/VectorToSPIRV/VectorToSPIRVPass.h"			#include "mlir/Conversion/VectorToSPIRV/VectorToSPIRVPass.h"

	namespace mlir {			namespace mlir {

	/// Generate the code for registering conversion passes.			/// Generate the code for registering conversion passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Conversion/Passes.h.inc"			#include "mlir/Conversion/Passes.h.inc"

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_CONVERSION_PASSES_H			#endif // MLIR_CONVERSION_PASSES_H

mlir/include/mlir/Conversion/Passes.td

Show First 20 Lines • Show All 1,071 Lines • ▼ Show 20 Lines let options = [

Option<"targetRank", "target-rank", "unsigned", /*default=*/"1", Option<"targetRank", "target-rank", "unsigned", /*default=*/"1",

"Target vector rank to which transfer ops should be lowered">, "Target vector rank to which transfer ops should be lowered">,

Option<"lowerTensors", "lower-tensors", "bool", /*default=*/"false", Option<"lowerTensors", "lower-tensors", "bool", /*default=*/"false",

"Lower transfer ops that operate on tensors"> "Lower transfer ops that operate on tensors">

]; ];

} }

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// VectorToSME

c-rhodesUnsubmitted

Not Done

//===----------------------------------------------------------------------===//

- // VectorToME

+ // VectorToSME

//===----------------------------------------------------------------------===//

c-rhodes:

//===----------------------------------------------------------------------===//

def ConvertVectorToSME : Pass<"convert-vector-to-arm-sme"> {

c-rhodesUnsubmitted

Not Done

//===----------------------------------------------------------------------===//

- def ConvertVectorToSME : Pass<"convert-vector-to-sme"> {

+ def ConvertVectorToSME : Pass<"convert-vector-to-arm-sme"> {

let summary = "Lower the operations from the vector dialect into the ArmSME "

I think we should add arm since that's the full name of the dialect, would also apply to comments/filenames.

c-rhodes: I think we should add arm since that's the full name of the dialect, would also apply to…

awarzynskiAuthorUnsubmitted

Done

To be perfectly honest, I feel that shorter names are better and in general feel that Arm{NEON|SVE|SME} should be renamed as {NEON|SVE|SME}. But I agree that in the meantime we should prioritise consistency.

awarzynski: To be perfectly honest, I feel that shorter names are better and in general feel that…

c-rhodesUnsubmitted

Not Done

To be perfectly honest, I feel that shorter names are better and in general feel that Arm{NEON|SVE|SME} should be renamed as {NEON|SVE|SME}. But I agree that in the meantime we should prioritise consistency.

There's a good reason for keeping Arm in the name, I think NEON has been around long enough for people to recognise it as an Arm technology, but SVE/SME have generic names and we have to be cognisant most people probably don't know what they are, 3 extra characters to add clarity seems like a small price to pay to me.

c-rhodes: > To be perfectly honest, I feel that shorter names are better and in general feel that…

let summary = "Lower the operations from the vector dialect into the ArmSME "

"dialect";

let description = [{

Pass that converts vector dialect operations into equivalent ArmSME dialect

operations.

}];

let dependentDialects = ["arm_sme::ArmSMEDialect"];

}

c-rhodesUnsubmitted

Not Done

operations.

}];

- let dependentDialects = [

- "arm_sme::ArmSMEDialect"

- ];

+ let dependentDialects = ["arm_sme::ArmSMEDialect"];

}

//===----------------------------------------------------------------------===//

c-rhodes:

//===----------------------------------------------------------------------===//

// VectorToLLVM // VectorToLLVM

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

def ConvertVectorToLLVMPass : Pass<"convert-vector-to-llvm"> { def ConvertVectorToLLVMPass : Pass<"convert-vector-to-llvm"> {

let summary = "Lower the operations from the vector dialect into the LLVM " let summary = "Lower the operations from the vector dialect into the LLVM "

"dialect"; "dialect";

let description = [{ let description = [{

▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

mlir/include/mlir/Conversion/VectorToSME/VectorToSME.h

This file was added.

				//===- VectorToSME.h - Convert vector to ArmSME dialect -------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				#ifndef MLIR_CONVERSION_VECTORTOSME_VECTORTOSME_H_
				#define MLIR_CONVERSION_VECTORTOSME_VECTORTOSME_H_

				#include "mlir/Transforms/DialectConversion.h"

				namespace mlir {
				class Pass;

				#define GEN_PASS_DECL_CONVERTVECTORTOSME
				#include "mlir/Conversion/Passes.h.inc"

				/// Collect a set of patterns to lower Vector ops to ArmSME ops that map to LLVM
				/// intrinsics.
				void populateVectorToSMEPatterns(RewritePatternSet &patterns, MLIRContext &ctx);

				} // namespace mlir

				#endif // MLIR_CONVERSION_VECTORTOSME_VECTORTOSME_H_

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.h

	Show All 9 Lines
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_DIALECT_ARMSME_IR_ARMSME_H			#ifndef MLIR_DIALECT_ARMSME_IR_ARMSME_H
	#define MLIR_DIALECT_ARMSME_IR_ARMSME_H			#define MLIR_DIALECT_ARMSME_IR_ARMSME_H

	#include "mlir/Bytecode/BytecodeOpInterface.h"			#include "mlir/Bytecode/BytecodeOpInterface.h"
	#include "mlir/Dialect/SCF/IR/SCF.h"			#include "mlir/Dialect/SCF/IR/SCF.h"
				#include "mlir/Dialect/Vector/IR/VectorOps.h"
	#include "mlir/IR/BuiltinTypes.h"			#include "mlir/IR/BuiltinTypes.h"
	#include "mlir/IR/Dialect.h"			#include "mlir/IR/Dialect.h"
	#include "mlir/IR/OpDefinition.h"			#include "mlir/IR/OpDefinition.h"
	#include "mlir/Interfaces/SideEffectInterfaces.h"			#include "mlir/Interfaces/SideEffectInterfaces.h"

	#include "mlir/Dialect/ArmSME/IR/ArmSMEDialect.h.inc"			#include "mlir/Dialect/ArmSME/IR/ArmSMEDialect.h.inc"

	#define GET_OP_CLASSES			#define GET_OP_CLASSES
	#include "mlir/Dialect/ArmSME/IR/ArmSME.h.inc"			#include "mlir/Dialect/ArmSME/IR/ArmSME.h.inc"

	#endif // MLIR_DIALECT_ARMSME_IR_ARMSME_H			#endif // MLIR_DIALECT_ARMSME_IR_ARMSME_H

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td

Show All 27 Lines def ArmSME_Dialect : Dialect {

let description = [{ let description = [{

This dialect contains the definitions necessary to target Arm SME This dialect contains the definitions necessary to target Arm SME

scalable matrix operations. scalable matrix operations.

Sources: Sources:

https://developer.arm.com/documentation/ddi0616 https://developer.arm.com/documentation/ddi0616

https://developer.arm.com/documentation/ddi0602/2023-03/SME-Instructions https://developer.arm.com/documentation/ddi0602/2023-03/SME-Instructions

}]; }];

let dependentDialects = ["scf::SCFDialect"]; let dependentDialects = ["scf::SCFDialect", "vector::VectorDialect"];

} }

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// ArmSME type definitions // ArmSME type definitions

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

class SMETileType<Type datatype, list<int> dims, string description> class SMETileType<Type datatype, list<int> dims, string description>

: ShapedContainerType<[datatype], : ShapedContainerType<[datatype],

And<[IsVectorOfRankPred<[2]>, And<[IsVectorOfRankPred<[2]>,

CPred<[{::llvm::cast<::mlir::VectorType>($_self).allDimsScalable()}]>, CPred<[{::llvm::cast<::mlir::VectorType>($_self).allDimsScalable()}]>,

CPred<"::llvm::cast<::mlir::VectorType>($_self).getShape() == ArrayRef<int64_t>({" # !interleave(dims, ", ") # "})">]>, CPred<"::llvm::cast<::mlir::VectorType>($_self).getShape() == ArrayRef<int64_t>({" # !interleave(dims, ", ") # "})">]>,

description>; description>;

def nxnxv16i8 : SMETileType<I8, [16, 16], "vector<[16]x[16]xi8>">; def nxnxv16i8 : SMETileType<I8, [16, 16], "vector<[16]x[16]xi8>">;

def nxnxv8i16 : SMETileType<I16, [8, 8 ], "vector<[8]x[8]xi16>">; def nxnxv8i16 : SMETileType<I16, [8, 8 ], "vector<[8]x[8]xi16>">;

def nxnxv4i32 : SMETileType<I32, [4, 4 ], "vector<[4]x[4]xi32>">; def nxnxv4i32 : SMETileType<I32, [4, 4 ], "vector<[4]x[4]xi32>">;

def nxnxv2i64 : SMETileType<I64, [2, 2 ], "vector<[2]x[2]xi64>">; def nxnxv2i64 : SMETileType<I64, [2, 2 ], "vector<[2]x[2]xi64>">;

def nxnxv1i128 : SMETileType<I128, [1, 1 ], "vector<[1]x[1]xi128>">; def nxnxv1i128 : SMETileType<I128, [1, 1 ], "vector<[1]x[1]xi128>">;

def nxnxv8f16 : SMETileType<F16, [8, 8 ], "vector<[8]x[8]xf16>">; def nxnxv8f16 : SMETileType<F16, [8, 8 ], "vector<[8]x[8]xf16>">;

def nxnxv8bf16 : SMETileType<BF16, [8, 8 ], "vector<[8]x[8]xbf16>">; def nxnxv8bf16 : SMETileType<BF16, [8, 8 ], "vector<[8]x[8]xbf16>">;

def nxnxv4f32 : SMETileType<F32, [4, 4 ], "vector<[4]x[4]xf32>">; def nxnxv4f32 : SMETileType<F32, [4, 4 ], "vector<[4]x[4]xf32>">;

def nxnxv2f64 : SMETileType<F64, [2, 2 ], "vector<[2]x[2]xf64>">; def nxnxv2f64 : SMETileType<F64, [2, 2 ], "vector<[2]x[2]xf64>">;

WanderAwayUnsubmitted

Not Done

nit: Since this operation stores the entire ZA tile as opposed to more useful (virtual) tiles, it seems to be more appropriate to name this to save or spill or something similar? I think it makes sense to distinguish this version (using str) with other stores leveraging the st1* instructions

WanderAway: nit: Since this operation stores the entire ZA tile as opposed to more useful (virtual) tiles…

awarzynskiAuthorUnsubmitted

Done

Well, the idea is to iterate this design and to make this Op spill an SME virtual tile - once we can specify tile ID :) (this should happen soon)

I think that are right that it would be good to have a dedicated Op for spilling the whole array, but I will refrain from renaming just now.

awarzynski: Well, the idea is to iterate this design and to make this Op spill an SME virtual tile - once…

def SMETile : AnyTypeOf<[nxnxv16i8, nxnxv8i16, nxnxv4i32, nxnxv2i64, nxnxv1i128, def SMETile : AnyTypeOf<[nxnxv16i8, nxnxv8i16, nxnxv4i32, nxnxv2i64, nxnxv1i128,

nxnxv8f16, nxnxv8bf16, nxnxv4f32, nxnxv2f64]>; nxnxv8f16, nxnxv8bf16, nxnxv4f32, nxnxv2f64]>;

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// ArmSME op definitions // ArmSME op definitions

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

class ArmSME_Op<string mnemonic, list<Trait> traits = []> : class ArmSME_Op<string mnemonic, list<Trait> traits = []> :

▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines let description = [{

%za0_q = arm_sme.get_tile_id : i128 %za0_q = arm_sme.get_tile_id : i128

``` ```

}]; }];

let results = (outs AnyTypeOf<[I8, I16, I32, I64, I128]>:$tile_id); let results = (outs AnyTypeOf<[I8, I16, I32, I64, I128]>:$tile_id);

let assemblyFormat = "attr-dict `:` type($tile_id)"; let assemblyFormat = "attr-dict `:` type($tile_id)";

} }

def ZeroOp : ArmSME_Op<"zero"> {

dcaballeUnsubmitted

Done

What are the side effects of this op?

dcaballe: What are the side effects of this op?

let summary = "Initialize ZA with 0s";

c-rhodesUnsubmitted

Not Done

def ZeroOp : ArmSME_Op<"zero"> {

- let summary = "Initialise ZA with 0s";

+ let summary = "Initialize ZA with 0s";

let results = (outs

nit: american spellings (unfortunately 😢)

c-rhodes: nit: american spellings (unfortunately 😢)

awarzynskiAuthorUnsubmitted

Done

AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to change though.

awarzynski: AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to…

c-rhodesUnsubmitted

Not Done

AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to change though.

Ah ok, coming from LLVM/Clang I thought American spellings were standard, apologies if that's not the case.

c-rhodes: > AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to…

dcaballeUnsubmitted

Done

ZA -> ZA tile/tile register/register?

dcaballe: ZA -> ZA tile/tile register/register?

let results = (outs nxnxv16i8:$res);

let description = [{

c-rhodesUnsubmitted

Not Done

def ZeroOp : ArmSME_Op<"zero"> {

let summary = "Initialise ZA with 0s";

- let results = (outs

- VectorOfRankAndType<[2], [I8]>:$res);

+ let results = (outs nxnxv16i8:$res);

let description = [{

c-rhodes:

Initialise ZA with 0. This operation is convenient wrapper for the SME

`zero` intrinsic and instruction.

NOTE: At the moment it is assumed that the element type is `i8` and that

there's only one "virtual tile".

Example:

```mlir

%0 = arm_sme.zero : vector<[16]x[16]xi8>

```

c-rhodesUnsubmitted

Not Done

```mlir

- %0 = arm_sme.zero : vector<[16]x[16]xi8>

+ %0 = arm_sme.zero : vector<[16]x[16]xi8>

```

}];

let extraClassDeclaration = [{

nit: indentation

c-rhodes: nit: indentation

}];

let extraClassDeclaration = [{

VectorType getVectorType() {

return ::llvm::cast<VectorType>(getRes().getType());

}

dcaballeUnsubmitted

Not Done

Curious... getType or getResultType (or similar ones, auto-generated) should return a VectorType if nxnxv... are defined as vectors. Isn't that the case? Do we need this method for some other reason then?

dcaballe: Curious... `getType` or `getResultType` (or similar ones, auto-generated) should return a…

awarzynskiAuthorUnsubmitted

Done

No, it returns an abstract Type that you then have to cast to Vector. At least that's what I'm seeing 🤔 .

awarzynski: No, it returns an abstract `Type` that you then have to cast to Vector. At least that's what…

}];

let assemblyFormat = "attr-dict `:` type($res)";

}

def TileStoreOp : ArmSME_Op<"tile_store"> {

dcaballeUnsubmitted

Not Done

This one should at least have memory side effects

dcaballe: This one should at least have memory side effects

awarzynskiAuthorUnsubmitted

Done

See [MemWrite] on L238:

let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,
                 Variadic<Index>:$indices,
                 nxnxv16i8:$valueToStore);

awarzynski: See `[MemWrite]` on L238: ``` let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>…

let summary = "Tile store operation";

let description = [{

Store a 2D SME "virtual tile" to memory.

NOTE: At the moment it is assumed that the element type is `i8` and that

there's only one "virtual tile".

Example:

```mlir

arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>

dcaballeUnsubmitted

Not Done

Would it make sense to align the operand order with the rest of store ops in MLIR? I.e., value-to-store, dst-memref [indices] : vector-type, memref-type?

dcaballe: Would it make sense to align the operand order with the rest of store ops in MLIR? I.e., `value…

awarzynskiAuthorUnsubmitted

Done

Good shout! I will align this with Vector_StoreOp.

awarzynski: Good shout! I will align this with `Vector_StoreOp`.

```

c-rhodesUnsubmitted

Not Done

```mlir

- arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>

+ arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>

```

}];

let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,

nit: indentation

c-rhodes: nit: indentation

}];

c-rhodesUnsubmitted

Not Done

arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>

```

- }];

+ }];

let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,

nit: indentation

c-rhodes: nit: indentation

let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,

c-rhodesUnsubmitted

Not Done

```mlir

- arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>

+ arm_sme.tile_store %0, %arg0[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>

```

}];

let arguments = (ins nxnxv16i8:$valueToStore,

example needs updating now the operand order has changed

c-rhodes: example needs updating now the operand order has changed

awarzynskiAuthorUnsubmitted

Done

Argh, assembly format needs updating too. Please double check. I am trying to align with VectorStoreOp, but this looks off: assembly format for VectorStoreOp.

awarzynski: Argh, assembly format needs updating too. Please double check. I am trying to align with…

Variadic<Index>:$indices,

nxnxv16i8:$valueToStore);

let extraClassDeclaration = [{

c-rhodesUnsubmitted

Done

Variadic<Index>:$indices,

- VectorOfRankAndType<[2], [I8]>:$valueToStore);

+ nxnxv16i8:$valueToStore);

let extraClassDeclaration = [{

c-rhodes:

MemRefType getMemRefType() {

return ::llvm::cast<MemRefType>(getBase().getType());

c-rhodesUnsubmitted

Done

nit: move to above line or indent to make it clear it applies to the memref

c-rhodes: nit: move to above line or indent to make it clear it applies to the memref

}

VectorType getVectorType() {

return ::llvm::cast<VectorType>(getValueToStore().getType());

}

}];

c-rhodesUnsubmitted

Done

return ::llvm::cast<MemRefType>(getBase().getType());

}

- VectorType getVectorType() {

- return ::llvm::cast<VectorType>(getValueToStore().getType());

- }

+ VectorType getVectorType() {

+ return ::llvm::cast<VectorType>(getValueToStore().getType());

+ }

}];

let assemblyFormat = "$base `[` $indices `]` `,` $valueToStore attr-dict `:` "

nit: indentation

c-rhodes: nit: indentation

c-rhodesUnsubmitted

Done

arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>

```

}];

- let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,

+ let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,

Variadic<Index>:$indices,

nxnxv16i8:$valueToStore);

- let extraClassDeclaration = [{

+ let extraClassDeclaration = [{

MemRefType getMemRefType() {

return ::llvm::cast<MemRefType>(getBase().getType());

}

VectorType getVectorType() {

return ::llvm::cast<VectorType>(getValueToStore().getType());

}

}];

let assemblyFormat = "$base `[` $indices `]` `,` $valueToStore attr-dict `:` "

nit: indentation

c-rhodes: nit: indentation

let assemblyFormat = "$base `[` $indices `]` `,` $valueToStore attr-dict `:` "

"type($base) `,` type($valueToStore)";

}

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// ArmSME Intrinsic op definitions // ArmSME Intrinsic op definitions

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

def MOPPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2], [I1]>; def MOPPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2], [I1]>;

def MOPVector : ScalableVectorOfLengthAndType<[16, 8, 4, 2], def MOPVector : ScalableVectorOfLengthAndType<[16, 8, 4, 2],

[I8, I16, BF16, F16, F32, F64]>; [I8, I16, BF16, F16, F32, F64]>;

def LDSTPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2, 1], [I1]>; def LDSTPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2, 1], [I1]>;

▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

mlir/lib/Conversion/CMakeLists.txt

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	add_subdirectory(TosaToArith)			add_subdirectory(TosaToArith)
	add_subdirectory(TosaToLinalg)			add_subdirectory(TosaToLinalg)
	add_subdirectory(TosaToSCF)			add_subdirectory(TosaToSCF)
	add_subdirectory(TosaToTensor)			add_subdirectory(TosaToTensor)
	add_subdirectory(VectorToLLVM)			add_subdirectory(VectorToLLVM)
	add_subdirectory(VectorToGPU)			add_subdirectory(VectorToGPU)
	add_subdirectory(VectorToSCF)			add_subdirectory(VectorToSCF)
	add_subdirectory(VectorToSPIRV)			add_subdirectory(VectorToSPIRV)
				add_subdirectory(VectorToSME)
				dcaballeUnsubmitted Done Reply Inline Actions sort dcaballe: sort

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	populateVectorToLLVMConversionPatterns(
converter, patterns, reassociateFPReductions, force32BitVectorIndices);		converter, patterns, reassociateFPReductions, force32BitVectorIndices);
populateVectorToLLVMMatrixConversionPatterns(converter, patterns);		populateVectorToLLVMMatrixConversionPatterns(converter, patterns);

// Architecture specific augmentations.		// Architecture specific augmentations.
LLVMConversionTarget target(getContext());		LLVMConversionTarget target(getContext());
target.addLegalDialect<arith::ArithDialect>();		target.addLegalDialect<arith::ArithDialect>();
target.addLegalDialect<memref::MemRefDialect>();		target.addLegalDialect<memref::MemRefDialect>();
target.addLegalOp<UnrealizedConversionCastOp>();		target.addLegalOp<UnrealizedConversionCastOp>();
if (armNeon) {		if (armNeon) {
		c-rhodesUnsubmitted Done Reply Inline Actions nit: unrelated change c-rhodes: nit: unrelated change
// TODO: we may or may not want to include in-dialect lowering to		// TODO: we may or may not want to include in-dialect lowering to
// LLVM-compatible operations here. So far, all operations in the dialect		// LLVM-compatible operations here. So far, all operations in the dialect
// can be translated to LLVM IR so there is no conversion necessary.		// can be translated to LLVM IR so there is no conversion necessary.
target.addLegalDialect<arm_neon::ArmNeonDialect>();		target.addLegalDialect<arm_neon::ArmNeonDialect>();
}		}
if (armSVE) {		if (armSVE) {
configureArmSVELegalizeForExportTarget(target);		configureArmSVELegalizeForExportTarget(target);
populateArmSVELegalizeForLLVMExportPatterns(converter, patterns);		populateArmSVELegalizeForLLVMExportPatterns(converter, patterns);
}		}
if (armSME) {		if (armSME) {
configureArmSMELegalizeForExportTarget(target);		configureArmSMELegalizeForExportTarget(target);
populateArmSMELegalizeForLLVMExportPatterns(converter, patterns);		populateArmSMELegalizeForLLVMExportPatterns(converter, patterns);
arm_sme::populateVectorTransferLoweringPatterns(converter, patterns);
}		}
if (amx) {		if (amx) {
configureAMXLegalizeForExportTarget(target);		configureAMXLegalizeForExportTarget(target);
populateAMXLegalizeForLLVMExportPatterns(converter, patterns);		populateAMXLegalizeForLLVMExportPatterns(converter, patterns);
}		}
if (x86Vector) {		if (x86Vector) {
configureX86VectorLegalizeForExportTarget(target);		configureX86VectorLegalizeForExportTarget(target);
populateX86VectorLegalizeForLLVMExportPatterns(converter, patterns);		populateX86VectorLegalizeForLLVMExportPatterns(converter, patterns);
}		}

if (failed(		if (failed(
applyPartialConversion(getOperation(), target, std::move(patterns))))		applyPartialConversion(getOperation(), target, std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}

mlir/lib/Conversion/VectorToSME/CMakeLists.txt

This file was added.

add_mlir_conversion_library(MLIRVectorToSME

VectorToSME.cpp

VectorToSMEPass.cpp

ADDITIONAL_HEADER_DIRS

${MLIR_MAIN_INCLUDE_DIR}/mlir/Conversion/VectorToSME

DEPENDS

MLIRConversionPassIncGen

LINK_LIBS PUBLIC

MLIRArmSMEDialect

MLIRLLVMCommonConversion

MLIRTransforms

c-rhodesUnsubmitted

Not Done

MLIRArmSMEDialect

MLIRTransforms

+ MLIRLLVMCommonConversion

)

this failed to compile for me with undefined symbol errors, I think you're missing this library and MLIRVectorDialect in mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt?

c-rhodes: this failed to compile for me with undefined symbol errors, I think you're missing this library…

awarzynskiAuthorUnsubmitted

Done

Thanks for catching this - I had shared libs turned off.

awarzynski: Thanks for catching this - I had shared libs turned off.

)

mlir/lib/Conversion/VectorToSME/VectorToSME.cpp

This file was added.

				//===- VectorToSME.cpp - Conversion from Vector to the SME dialect --------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Conversion/VectorToSME/VectorToSME.h"

				#include "mlir/Dialect/Arith/Utils/Utils.h"
				#include "mlir/Dialect/ArmSME/IR/ArmSME.h"
				#include "mlir/Dialect/ArmSME/Transforms/Transforms.h"
				#include "mlir/Dialect/Vector/Transforms/VectorTransforms.h"
				#include "mlir/IR/BuiltinTypes.h"
				#include "mlir/IR/TypeUtilities.h"
				#include "mlir/Target/LLVMIR/TypeToLLVM.h"
				#include "mlir/Transforms/DialectConversion.h"
				#include "llvm/Support/Casting.h"
				#include <optional>

				using namespace mlir;

				static constexpr unsigned kMinNumElts = 16;
				c-rhodesUnsubmitted Not Done Reply Inline Actions I think we would want vector namespace to make it clear which ops are vector vs SME? c-rhodes: I think we would want vector namespace to make it clear which ops are vector vs SME?
				awarzynskiAuthorUnsubmitted Done Reply Inline Actions Agreed. awarzynski: Agreed.

				/// Returns true if 'val' is a splat of zero, false otherwise.
				static bool isSplatZero(Type elemType, DenseElementsAttr val) {
				if (llvm::isa<FloatType>(elemType))
				return val && val.isSplat() && val.getSplatValue<APFloat>().isZero();
				if (llvm::isa<IntegerType>(elemType))
				return val && val.isSplat() && val.getSplatValue<APInt>().isZero();
				return false;
				}

				/// Look at `vector.transfer_write` operations and convert suitable candidates
				/// to ArmSME operations, e.g.:
				///
				/// %cst = arith.constant dense<0> : vector<[16]x[16]xi8>
				/// vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8>
				///
				/// is converted to:
				///
				/// arm_sme.zero
				/// arm_sme.tile_store %arg0[%c0, %c0] : memref<?x?xi8>
				struct TransferWriteToArmSME
				: public OpRewritePattern<vector::TransferWriteOp> {
				using OpRewritePattern<vector::TransferWriteOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(vector::TransferWriteOp writeOp,
				PatternRewriter &rewriter) const final {
				auto vType = writeOp.getVectorType();
				if (vType.getRank() != 2)
				return failure();
				if (vType.getShape() != ArrayRef<int64_t>({kMinNumElts, kMinNumElts}))
				return failure();
				if (vType.getElementType() != rewriter.getI8Type())
				return failure();
				if (vType.getScalableDims().size() != 2)
				return failure();

				auto loc = writeOp.getLoc();

				auto memRefType = llvm::dyn_cast<MemRefType>(writeOp.getSource().getType());
				if (!memRefType)
				return failure();

				auto constant = writeOp.getVector().getDefiningOp<arith::ConstantOp>();
				if (!constant)
				return failure();

				auto denseAttr = dyn_cast<DenseElementsAttr>(constant.getValueAttr());
				if (!denseAttr \|\| !isSplatZero(vType.getElementType(), denseAttr))
				return failure();

				auto zero = rewriter.create<arm_sme::ZeroOp>(loc, vType);

				rewriter.replaceOpWithNewOp<arm_sme::TileStoreOp>(
				WanderAwayUnsubmitted Not Done Reply Inline Actions I feel like it makes a bit more sense to lower to `ZeroOp` from an `arith::ConstantOp` instead of bundling it with `TransferWrite`? It may be a good idea to add a a verifier to make sure `ZeroOp`s are consumed by only SME-compatible ops in the future? WanderAway: I feel like it makes a bit more sense to lower to `ZeroOp` from an `arith::ConstantOp` instead…
				awarzynskiAuthorUnsubmitted Done Reply Inline Actions I feel like it makes a bit more sense to lower to ZeroOp from an arith::ConstantOp instead of bundling it with TransferWrite? We bundled arith::ConstantOp with TransferWrite because `zero` feels a bit pointless if `transfer_write` cannot be lowered to SME (e.g. because the destination is not a `memref`). Also, we wanted to demonstrate end-to-end example and leave the finer details for later (i.e. "now"). You are making a very good point though - the current approach won't make sense once we try to fill `ZA` with something other than 0. I'd rather do it in a separate patch (it will require a few other changes too). awarzynski: > I feel like it makes a bit more sense to lower to ZeroOp from an arith::ConstantOp instead of…
				writeOp, writeOp.getSource(), writeOp.getIndices(), zero);
				return success();
				}
				tschuettUnsubmitted Not Done Reply Inline Actions If I read this correctly, TileStoreOp does not depend on ZeroOp? Thus, I can rearrange the order of them? tschuett: If I read this correctly, TileStoreOp does not depend on ZeroOp? Thus, I can rearrange the…
				awarzynskiAuthorUnsubmitted Done Reply Inline Actions Yeah, good catch! I was trying to work around the type conversion issue by not having any inputs/outputs, but that wont' scale to other element types. I will be updating this shortly. awarzynski: Yeah, good catch! I was trying to work around the type conversion issue by not having any…
				c-rhodesUnsubmitted Done Reply Inline Actions I think we could use replaceOpWithNewOp here, apologies if I didnt use that c-rhodes: I think we could use replaceOpWithNewOp here, apologies if I didnt use that
				};

				c-rhodesUnsubmitted Done Reply Inline Actions I think rewriter will take care of removing this if it's dead after replacing the store? c-rhodes: I think rewriter will take care of removing this if it's dead after replacing the store?
				void mlir::populateVectorToSMEPatterns(RewritePatternSet &patterns,
				MLIRContext &ctx) {
				patterns.add<TransferWriteToArmSME>(&ctx);
				}

mlir/lib/Conversion/VectorToSME/VectorToSMEPass.cpp

This file was added.

//===- VectorToSMEPass.cpp - Conversion from Vector to the SME dialect ----===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

#include "mlir/Conversion/VectorToSME/VectorToSME.h"

#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"

#include "mlir/Dialect/Arith/IR/Arith.h"

#include "mlir/Dialect/ArmSME/IR/ArmSME.h"

#include "mlir/Dialect/MemRef/IR/MemRef.h"

#include "mlir/Pass/Pass.h"

namespace mlir {

#define GEN_PASS_DEF_CONVERTVECTORTOSME

#include "mlir/Conversion/Passes.h.inc"

} // namespace mlir

using namespace mlir;

using namespace mlir::vector;

c-rhodesUnsubmitted

Not Done

//===----------------------------------------------------------------------===//

#include "mlir/Conversion/VectorToSME/VectorToSME.h"

#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"

- #include "mlir/Conversion/LLVMCommon/TypeConverter.h"

#include "mlir/Dialect/Arith/IR/Arith.h"

#include "mlir/Dialect/ArmSME/IR/ArmSME.h"

- #include "mlir/Dialect/ArmSME/Transforms/Transforms.h"

- #include "mlir/Dialect/Func/IR/FuncOps.h"

- #include "mlir/Dialect/LLVMIR/LLVMDialect.h"

#include "mlir/Dialect/MemRef/IR/MemRef.h"

- #include "mlir/Dialect/Vector/Transforms/LoweringPatterns.h"

- #include "mlir/Dialect/Vector/Transforms/VectorRewritePatterns.h"

#include "mlir/Pass/Pass.h"

- #include "mlir/Transforms/GreedyPatternRewriteDriver.h"

namespace mlir {

quite a few of these aren't used

c-rhodes: quite a few of these aren't used

awarzynskiAuthorUnsubmitted

Done

Thanks!

What do you use to find unused headers? I've updated my Vim LSP recently and haven't had a chance to restore that functionality yet :(

awarzynski: Thanks! What do you use to find unused headers? I've updated my Vim LSP recently and haven't…

c-rhodesUnsubmitted

Not Done

Thanks!

What do you use to find unused headers? I've updated my Vim LSP recently and haven't had a chance to restore that functionality yet :(

I don't use a tool, I checked out your patch looked at the code and removed ones I couldnt see were used then verified by compiling.

c-rhodes: > Thanks! > > What do you use to find unused headers? I've updated my Vim LSP recently and…

namespace {

struct LowerVectorToSMEPass

: public impl::ConvertVectorToSMEBase<LowerVectorToSMEPass> {

using Base::Base;

void getDependentDialects(DialectRegistry &registry) const override {

registry.insert<arm_sme::ArmSMEDialect>();

}

void runOnOperation() override;

};

} // namespace

void LowerVectorToSMEPass::runOnOperation() {

// Convert to the LLVM IR dialect.

RewritePatternSet patterns(&getContext());

LLVMConversionTarget target(getContext());

target.addLegalDialect<arith::ArithDialect>();

target.addLegalDialect<memref::MemRefDialect>();

target.addLegalOp<UnrealizedConversionCastOp>();

target.addLegalOp<arm_sme::TileStoreOp, arm_sme::ZeroOp>();

populateVectorToSMEPatterns(patterns, getContext());

if (failed(

applyPartialConversion(getOperation(), target, std::move(patterns))))

signalPassFailure();

}

c-rhodesUnsubmitted

Not Done

are these necessary?

c-rhodes: are these necessary?

mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt

	add_mlir_dialect_library(MLIRArmSMEDialect			add_mlir_dialect_library(MLIRArmSMEDialect
	ArmSME.cpp			ArmSME.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME

	DEPENDS			DEPENDS
	MLIRArmSMEIncGen			MLIRArmSMEIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRIR			MLIRIR
	MLIRLLVMDialect			MLIRLLVMDialect
	MLIRSCFDialect			MLIRSCFDialect
				MLIRVectorDialect
				dcaballeUnsubmitted Done Reply Inline Actions sort dcaballe: sort
	MLIRSideEffectInterfaces			MLIRSideEffectInterfaces
	)			)

mlir/lib/Dialect/ArmSME/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRArmSMETransforms			add_mlir_dialect_library(MLIRArmSMETransforms
	EnableArmStreaming.cpp			EnableArmStreaming.cpp
	LegalizeForLLVMExport.cpp			LegalizeForLLVMExport.cpp
	LowerVectorOps.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME/Transforms			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME/Transforms

	DEPENDS			DEPENDS
	MLIRArmSMETransformsIncGen			MLIRArmSMETransformsIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRArmSMEDialect			MLIRArmSMEDialect
	MLIRFuncDialect			MLIRFuncDialect
	MLIRLLVMCommonConversion			MLIRLLVMCommonConversion
	MLIRVectorDialect			MLIRVectorDialect
	MLIRSCFDialect			MLIRSCFDialect
	MLIRPass			MLIRPass
	)			)

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp

//===- LegalizeForLLVMExport.cpp - Prepare ArmSME for LLVM translation ----===// //===- LegalizeForLLVMExport.cpp - Prepare ArmSME for LLVM translation ----===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "mlir/Conversion/LLVMCommon/ConversionTarget.h" #include "mlir/Conversion/LLVMCommon/ConversionTarget.h"

#include "mlir/Conversion/LLVMCommon/Pattern.h" #include "mlir/Conversion/LLVMCommon/Pattern.h"

#include "mlir/Dialect/Arith/IR/Arith.h"

#include "mlir/Dialect/ArmSME/IR/ArmSME.h" #include "mlir/Dialect/ArmSME/IR/ArmSME.h"

#include "mlir/Dialect/ArmSME/Transforms/Transforms.h" #include "mlir/Dialect/ArmSME/Transforms/Transforms.h"

#include "mlir/Dialect/Func/IR/FuncOps.h" #include "mlir/Dialect/Func/IR/FuncOps.h"

#include "mlir/Dialect/LLVMIR/LLVMDialect.h" #include "mlir/Dialect/LLVMIR/LLVMDialect.h"

#include "mlir/Dialect/SCF/IR/SCF.h" #include "mlir/Dialect/SCF/IR/SCF.h"

#include "mlir/Dialect/Vector/IR/VectorOps.h"

using namespace mlir; using namespace mlir;

using namespace mlir::arm_sme; using namespace mlir::arm_sme;

static constexpr unsigned kMinNumElts = 16;

static constexpr unsigned kZeroZAMask = 255;

namespace { namespace {

/// Insert 'llvm.aarch64.sme.za.enable' intrinsic at the start of 'func.func' /// Insert 'llvm.aarch64.sme.za.enable' intrinsic at the start of 'func.func'

/// ops to enable the ZA storage array. /// ops to enable the ZA storage array.

struct EnableZAPattern : public OpRewritePattern<func::FuncOp> { struct EnableZAPattern : public OpRewritePattern<func::FuncOp> {

using OpRewritePattern::OpRewritePattern; using OpRewritePattern::OpRewritePattern;

LogicalResult matchAndRewrite(func::FuncOp op, LogicalResult matchAndRewrite(func::FuncOp op,

PatternRewriter &rewriter) const final { PatternRewriter &rewriter) const final {

OpBuilder::InsertionGuard g(rewriter); OpBuilder::InsertionGuard g(rewriter);

Show All 25 Lines matchAndRewrite(GetTileID op, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override { ConversionPatternRewriter &rewriter) const override {

// TODO: implement tile allocation, currently only tile 0 is supported. // TODO: implement tile allocation, currently only tile 0 is supported.

rewriter.replaceOpWithNewOp<LLVM::ConstantOp>(op, rewriter.getI32Type(), 0); rewriter.replaceOpWithNewOp<LLVM::ConstantOp>(op, rewriter.getI32Type(), 0);

return success(); return success();

} }

}; };

} // namespace } // namespace

void mlir::populateArmSMELegalizeForLLVMExportPatterns(

LLVMTypeConverter &converter, RewritePatternSet &patterns) {

patterns.add<EnableZAPattern, DisableZAPattern>(patterns.getContext());

}

void mlir::configureArmSMELegalizeForExportTarget( void mlir::configureArmSMELegalizeForExportTarget(

LLVMConversionTarget &target) { LLVMConversionTarget &target) {

target.addLegalOp<scf::ForOp, scf::YieldOp, arm_sme::CastTileToVector, target.addLegalOp<scf::ForOp, scf::YieldOp, arm_sme::CastTileToVector,

arm_sme::CastVectorToTile, arm_sme::aarch64_sme_zero, arm_sme::CastVectorToTile, arm_sme::aarch64_sme_zero,

arm_sme::aarch64_sme_str, arm_sme::aarch64_sme_za_enable, arm_sme::aarch64_sme_str, arm_sme::aarch64_sme_za_enable,

arm_sme::aarch64_sme_za_disable>(); arm_sme::aarch64_sme_za_disable>();

target.addLegalOp<GetTileID>(); target.addLegalOp<GetTileID>();

// Mark 'func.func' ops as legal if either: // Mark 'func.func' ops as legal if either:

// 1. no 'arm_za' function attribute is present. // 1. no 'arm_za' function attribute is present.

// 2. the 'arm_za' function attribute is present and the first op in the // 2. the 'arm_za' function attribute is present and the first op in the

// function is an 'arm_sme::aarch64_sme_za_enable' intrinsic. // function is an 'arm_sme::aarch64_sme_za_enable' intrinsic.

target.addDynamicallyLegalOp<func::FuncOp>([&](func::FuncOp funcOp) { target.addDynamicallyLegalOp<func::FuncOp>([&](func::FuncOp funcOp) {

if (funcOp.isDeclaration()) if (funcOp.isDeclaration())

return true; return true;

auto firstOp = funcOp.getBody().front().begin(); auto firstOp = funcOp.getBody().front().begin();

return !funcOp->hasAttr("arm_za") || return !funcOp->hasAttr("arm_za") ||

isa<arm_sme::aarch64_sme_za_enable>(firstOp); isa<arm_sme::aarch64_sme_za_enable>(firstOp);

}); });

// Mark 'func.return' ops as legal if either: // Mark 'func.return' ops as legal if either:

// 1. no 'arm_za' function attribute is present. // 1. no 'arm_za' function attribute is present.

// 2. the 'arm_za' function attribute is present and there's a preceding // 2. the 'arm_za' function attribute is present and there's a preceding

// 'arm_sme::aarch64_sme_za_disable' intrinsic. // 'arm_sme::aarch64_sme_za_disable' intrinsic.

target.addDynamicallyLegalOp<func::ReturnOp>([&](func::ReturnOp returnOp) { target.addDynamicallyLegalOp<func::ReturnOp>([&](func::ReturnOp returnOp) {

bool hasDisableZA = false; bool hasDisableZA = false;

auto funcOp = returnOp->getParentOp(); auto funcOp = returnOp->getParentOp();

funcOp->walk<WalkOrder::PreOrder>( funcOp->walk<WalkOrder::PreOrder>(

[&](arm_sme::aarch64_sme_za_disable op) { hasDisableZA = true; }); [&](arm_sme::aarch64_sme_za_disable op) { hasDisableZA = true; });

return !funcOp->hasAttr("arm_za") || hasDisableZA; return !funcOp->hasAttr("arm_za") || hasDisableZA;

}); });

} }

/// Lower 'arm_sme.zero'. Use 'arm_sme.cast_tile_to_vector' to model the return

/// value. The latter is a nop, which should be folded away (e.g. during

/// canonicalisation).

c-rhodesUnsubmitted

Not Done

/// value. The latter is a nop, which should be folded away (e.g. during

- /// canonicalisation).

+ /// canonicalization).

///

/// BEFORE:

nit: american spelling

c-rhodes: nit: american spelling

///

/// BEFORE:

/// ```mlir

/// %0 = arm_sme.zero : vector<[16]x[16]xi8>

c-rhodesUnsubmitted

Not Done

/// ```mlir

- /// %0 = arm_sme.zero : vector<[16]x[16]xi8>

+ /// %0 = arm_sme.zero : vector<[16]x[16]xi8>

/// ```

nit: indentation

c-rhodes: nit: indentation

/// ```

///

/// AFTER:

/// ```mlir

/// %2 = arm_sme.cast_tile_to_vector %1 : i8 to vector<[16]x[16]xi8>

/// "arm_sme.intr.zero"(%c255_i32) : (i32) -> ()

c-rhodesUnsubmitted

Not Done

/// ```mlir

- /// %2 = arm_sme.cast_tile_to_vector %1 : i8 to vector<[16]x[16]xi8>

- /// "arm_sme.intr.zero"(%c255_i32) : (i32) -> ()

+ /// %2 = arm_sme.cast_tile_to_vector %1 : i8 to vector<[16]x[16]xi8>

+ /// "arm_sme.intr.zero"(%c255_i32) : (i32) -> ()

/// ```

nit: indentation

c-rhodes: nit: indentation

dcaballeUnsubmitted

Done

add getTileId? Otherwise, it's not clear where %1 is coming from

dcaballe: add getTileId? Otherwise, it's not clear where %1 is coming from

/// ```

struct ZeroOpConversion : public ConvertOpToLLVMPattern<ZeroOp> {

using ConvertOpToLLVMPattern<ZeroOp>::ConvertOpToLLVMPattern;

LogicalResult

matchAndRewrite(ZeroOp zero, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override {

auto loc = zero.getLoc();

// Create `CastTileToVectorOp` to use it as the output

auto tileId = rewriter.create<arm_sme::GetTileID>(

loc, zero.getVectorType().getElementType());

auto castTileToVec = rewriter.create<arm_sme::CastTileToVector>(

loc, zero.getResult().getType(), tileId);

c-rhodesUnsubmitted

Not Done

auto castTileToVec = rewriter.create<arm_sme::CastTileToVector>(

- loc, zero.getResult().getType(), tileId);

+ loc, zero.getVectorType(), tileId);

// Create 'arm_sme.intr.zero' intrinsic to zero ZA.

getVectorType?

c-rhodes: getVectorType?

// Create 'arm_sme.intr.zero' intrinsic to zero ZA.

auto tile = rewriter.create<arith::ConstantOp>(

loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(kZeroZAMask));

c-rhodesUnsubmitted

Done

this should be created before the zero, and we should add a note that get_tile_id and zero aren't chain together yet

c-rhodes: this should be created before the zero, and we should add a note that get_tile_id and zero…

rewriter.create<arm_sme::aarch64_sme_zero>(loc, tile);

zero.replaceAllUsesWith(castTileToVec.getResult());

rewriter.eraseOp(zero);

c-rhodesUnsubmitted

Not Done

loc, zero.getResult().getType().getElementType());

- auto castTileToVec = rewriter.create<arm_sme::CastTileToVector>(

- loc, zero.getResult().getType(), tileId);

// Create 'arm_sme.intr.zero' intrinsic to zero ZA.

auto tile = rewriter.create<arith::ConstantOp>(

loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(kZeroZAMask));

rewriter.create<arm_sme::aarch64_sme_zero>(loc, tile);

- zero.replaceAllUsesWith(castTileToVec.getResult());

- rewriter.eraseOp(zero);

+ rewriter.replaceOpWithNewOp:arm_sme::CastTileToVector>(

+ zero, zero.getVectorType(), tileId);

return success();

the cast op should create created after the intrinsic since it represents the tile loaded by the preceding intrinsic

c-rhodes: the cast op should create created after the intrinsic since it represents the tile loaded by…

c-rhodesUnsubmitted

Not Done

Please could you look at this again, the cast is still created before the intrinsic.

c-rhodes: Please could you look at this again, the cast is still created before the intrinsic.

dcaballeUnsubmitted

Not Done

Yes, you should be able to do rewriter.replaceOpWithNewOp(zero, ....

dcaballe: Yes, you should be able to do `rewriter.replaceOpWithNewOp(zero, ...`.

awarzynskiAuthorUnsubmitted

Done

Apologies @c-rhodes , I missed this comment. Will be updating shortly.

the cast op should create created after the intrinsic since it represents the tile loaded by the preceding intrinsic

Do you think that the order will matter in practice? Otherwise somebody could just rewrite your suggestion as:

the cast op should create created after the intrinsic since it represents the tile loaded by the _following_ intrinsic

IIUC, the order does not matter, but might be missing something? Regardless, we should definitely make sure that we are consistent and I am happy with "after" (i.e. your suggestion).

awarzynski: Apologies @c-rhodes , I missed this comment. Will be updating shortly. > the cast op should…

dcaballeUnsubmitted

Not Done

Ok, I see what you are trying to do here... and can't think of a better way. This is more like propagating information (getTileId) across different op converters but through the IR. I think I tried to do something similar by introducing a state in the converters but I barely remember. I'm ok with this.

dcaballe: Ok, I see what you are trying to do here... and can't think of a better way. This is more like…

return success();

}

};

/// Lower 'arm_sme.store_tile' to a loop over the rows of ZA and store each row

/// using 'arm_sme.intr.str'.

dcaballeUnsubmitted

Not Done

Something important here: we introduce the SME lowering layer to explicitly model what is needed for SME and make the conversion to LLVM easier. However, here we are materializing a loop. I'm wondering why that loop is not generated when we move from Vector to the SME dialect and then the conversion to LLVM is mostly a 1:1 translation to the intrinsics.

dcaballe: Something important here: we introduce the SME lowering layer to explicitly model what is…

c-rhodesUnsubmitted

Not Done

Something important here: we introduce the SME lowering layer to explicitly model what is needed for SME and make the conversion to LLVM easier. However, here we are materializing a loop. I'm wondering why that loop is not generated when we move from Vector to the SME dialect and then the conversion to LLVM is mostly a 1:1 translation to the intrinsics.

I've also been thinking about this, the load/stores in SME operate on ZA array vectors or tile slices, which are 1-d scalable vectors of SVL bits, rather than an entire tile, hence the loop materialization. Perhaps if we had custom ops that deal with tile vectors the loop could be emitted when going from Vector -> SME and these would later map 1-1 with LLVM intrinsics. We'll consider what we can do here, thanks for raising this.

c-rhodes: > Something important here: we introduce the SME lowering layer to explicitly model what is…

awarzynskiAuthorUnsubmitted

Done

Good points, thanks! Now that you have raised this I see that this abstraction should be re-fined.

Is it OK to iterate in future patches though? There's a few other patches that depend on one another, so I would land this as is and refactor separately. My main goal is to get the overall scaffolding in first (i.e. the "Vector to SME" pass). WDYT?

awarzynski: Good points, thanks! Now that you have raised this I see that this abstraction should be re…

c-rhodesUnsubmitted

Not Done

Good points, thanks! Now that you have raised this I see that this abstraction should be re-fined.

Is it OK to iterate in future patches though? There's a few other patches that depend on one another, so I would land this as is and refactor separately. My main goal is to get the overall scaffolding in first (i.e. the "Vector to SME" pass). WDYT?

Yeah that can be done separately.

c-rhodes: > Good points, thanks! Now that you have raised this I see that this abstraction should be re…

dcaballeUnsubmitted

Not Done

It sounds good to me to do this separately but this is a big abstraction change so hopefully we can do it sooner than later. If you think the non-loop abstraction is also useful, we could also have two level of abtractions within the same dialect, where we go first to the non-loop one and then materialize the loop at some point within the SME dialect. The Vector dialect is a good example of this.

dcaballe: It sounds good to me to do this separately but this is a big abstraction change so hopefully we…

c-rhodesUnsubmitted

Not Done

It sounds good to me to do this separately but this is a big abstraction change so hopefully we can do it sooner than later. If you think the non-loop abstraction is also useful, we could also have two level of abtractions within the same dialect, where we go first to the non-loop one and then materialize the loop at some point within the SME dialect. The Vector dialect is a good example of this.

I've shared an update on Discourse: https://discourse.llvm.org/t/loop-materialization-in-armsme/72354

And a solution in D156467

c-rhodes: > It sounds good to me to do this separately but this is a big abstraction change so hopefully…

///

/// BEFORE:

/// ```mlir

/// arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>,

/// vector<[16]x[16]xi8

c-rhodesUnsubmitted

Not Done

/// ```mlir

- /// arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>,

- /// vector<[16]x[16]xi8

+ /// arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>,

+ /// vector<[16]x[16]xi8

/// ```

nit: indentation

c-rhodes: nit: indentation

/// ```

///

/// AFTER:

/// ```mlir

/// %3 = "llvm.intr.vscale"() : () -> index

/// %c0_0 = arith.constant 0 : index

/// %5 = arith.muli %c16, %3 : index

/// scf.for %arg1 = %c0_0 to %5 step %c1 {

c-rhodesUnsubmitted

Not Done

nit: the variable names could be improved, %3 -> %vscale for example

c-rhodes: nit: the variable names could be improved, %3 -> %vscale for example

/// // (...)

/// "arm_sme.intr.str"(%row_idx, %addr) : (i32, !llvm.ptr) -> ()

c-rhodesUnsubmitted

Not Done

nit: indentation

c-rhodes: nit: indentation

/// ```

struct TileStoreOpConversion : public ConvertOpToLLVMPattern<TileStoreOp> {

using ConvertOpToLLVMPattern<TileStoreOp>::ConvertOpToLLVMPattern;

LogicalResult

matchAndRewrite(TileStoreOp store, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override {

auto memRefType = llvm::dyn_cast<MemRefType>(store.getMemRefType());

if (!memRefType)

return failure();

c-rhodesUnsubmitted

Done

ConversionPatternRewriter &rewriter) const override {

- auto memRefType = llvm::dyn_cast<MemRefType>(store.getMemRefType());

- if (!memRefType)

- return failure();

+ auto memRefType = store.getMemRefType();

auto loc = store.getLoc();

the cast can be removed

c-rhodes: the cast can be removed

auto loc = store.getLoc();

// Create loop that iterates from 0 to SVLB-1 inclusive (the number of

// vectors in ZA) and stores each ZA vector to memory.

auto step = rewriter.create<arith::ConstantIndexOp>(loc, 1);

auto minElems = rewriter.create<arith::ConstantIndexOp>(loc, kMinNumElts);

auto vscale =

rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());

auto lowerBound = rewriter.create<arith::ConstantIndexOp>(loc, 0);

auto upperBound = rewriter.create<arith::MulIOp>(loc, minElems, vscale);

auto forOp = rewriter.create<scf::ForOp>(loc, lowerBound, upperBound, step);

rewriter.setInsertionPointToStart(forOp.getBody());

// Create 'arm_sme.intr.str' intrinsic to store ZA vector.

auto vnumI64 = rewriter.create<arith::IndexCastUIOp>(

loc, rewriter.getI64Type(), forOp.getInductionVar());

auto offset =

rewriter.create<LLVM::ConstantOp>(loc, rewriter.getI64Type(), 0);

Value ptr = getStridedElementPtr(loc, memRefType, adaptor.getBase(),

ValueRange{vnumI64, offset}, rewriter);

auto vnumI32 = rewriter.create<arith::IndexCastUIOp>(

loc, rewriter.getI32Type(), forOp.getInductionVar());

rewriter.create<arm_sme::aarch64_sme_str>(loc, vnumI32, ptr);

rewriter.eraseOp(store);

return success();

}

};

void mlir::populateArmSMELegalizeForLLVMExportPatterns(

LLVMTypeConverter &converter, RewritePatternSet &patterns) {

patterns.add<EnableZAPattern, DisableZAPattern>(patterns.getContext());

patterns.add<TileStoreOpConversion, ZeroOpConversion>(converter);

}

c-rhodesUnsubmitted

Not Done

move this to bottom alongside populateArmSMELegalizeForLLVMExportPatterns?

c-rhodes: move this to bottom alongside `populateArmSMELegalizeForLLVMExportPatterns`?

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp

This file was deleted.

	//===- LowerVectorOps.cpp - Lower vector ops to SME -----------------------===//
	//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//
	//===----------------------------------------------------------------------===//
	//
	// This file implements rewrite patterns to lower vector dialect ops to ArmSME.
	//
	//===----------------------------------------------------------------------===//

	#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
	#include "mlir/Conversion/LLVMCommon/Pattern.h"
	#include "mlir/Dialect/Arith/IR/Arith.h"
	#include "mlir/Dialect/ArmSME/IR/ArmSME.h"
	#include "mlir/Dialect/ArmSME/Transforms/Transforms.h"
	#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
	#include "mlir/Dialect/SCF/IR/SCF.h"
	#include "mlir/Dialect/Vector/IR/VectorOps.h"
	#include "mlir/IR/BuiltinOps.h"
	#include "mlir/IR/PatternMatch.h"

	using namespace mlir;
	using namespace mlir::arm_sme;

	static constexpr unsigned kMinNumElts = 16;
	static constexpr unsigned kZeroZAMask = 255;

	/// Returns true if 'val' is a splat of zero, false otherwise.
	static bool isSplatZero(Type elemType, DenseElementsAttr val) {
	if (llvm::isa<FloatType>(elemType))
	return val && val.isSplat() && val.getSplatValue<APFloat>().isZero();
	if (llvm::isa<IntegerType>(elemType))
	return val && val.isSplat() && val.getSplatValue<APInt>().isZero();
	return false;
	}

	namespace {
	/// Lower 'vector.transfer_write' op to 'arm_sme.intr.zero' op. Currently only
	/// supports 2d scalable vector type 'vector<[16x16]xi8>' that maps to the ZA0.B
	/// SME virtual tile. This will be extended to support more element types.
	struct TransferWriteToArmSMEZeroLowering
	: public ConvertOpToLLVMPattern<vector::TransferWriteOp> {
	using ConvertOpToLLVMPattern<vector::TransferWriteOp>::ConvertOpToLLVMPattern;

	LogicalResult
	matchAndRewrite(vector::TransferWriteOp write, OpAdaptor adaptor,
	ConversionPatternRewriter &rewriter) const override {
	auto vType = write.getVectorType();
	if (vType.getRank() != 2)
	return failure();
	if (vType.getShape() != ArrayRef<int64_t>({kMinNumElts, kMinNumElts}))
	return failure();
	if (vType.getElementType() != rewriter.getI8Type())
	return failure();
	if (vType.getScalableDims().size() != 2)
	return failure();

	auto memRefType = llvm::dyn_cast<MemRefType>(write.getSource().getType());
	if (!memRefType)
	return failure();

	auto constant = write.getVector().getDefiningOp<arith::ConstantOp>();
	if (!constant)
	return failure();

	auto denseAttr = dyn_cast<DenseElementsAttr>(constant.getValueAttr());
	if (!denseAttr \|\| !isSplatZero(vType.getElementType(), denseAttr))
	return failure();

	auto loc = write.getLoc();

	// Create 'arm_sme.intr.zero' intrinsic to zero ZA.
	auto tile = rewriter.create<arith::ConstantOp>(
	loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(kZeroZAMask));
	rewriter.create<arm_sme::aarch64_sme_zero>(loc, tile);

	// Create loop that iterates from 0 to SVLB-1 inclusive (the number of
	// vectors in ZA) and stores each ZA vector to memory.
	auto step = rewriter.create<arith::ConstantIndexOp>(loc, 1);
	auto minElems = rewriter.create<arith::ConstantIndexOp>(loc, kMinNumElts);
	auto vscale =
	rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());
	auto lowerBound = rewriter.create<arith::ConstantIndexOp>(loc, 0);
	auto upperBound = rewriter.create<arith::MulIOp>(loc, minElems, vscale);
	auto forOp = rewriter.create<scf::ForOp>(loc, lowerBound, upperBound, step);
	rewriter.setInsertionPointToStart(forOp.getBody());

	// Create 'arm_sme.intr.str' intrinsic to store ZA vector.
	auto vnumI64 = rewriter.create<arith::IndexCastUIOp>(
	loc, rewriter.getI64Type(), forOp.getInductionVar());
	auto offset =
	rewriter.create<LLVM::ConstantOp>(loc, rewriter.getI64Type(), 0);
	Value ptr = getStridedElementPtr(loc, memRefType, adaptor.getSource(),
	ValueRange{vnumI64, offset}, rewriter);
	auto vnumI32 = rewriter.create<arith::IndexCastUIOp>(
	loc, rewriter.getI32Type(), forOp.getInductionVar());
	rewriter.create<arm_sme::aarch64_sme_str>(loc, vnumI32, ptr);

	rewriter.eraseOp(write);

	return success();
	}
	};
	} // namespace

	void mlir::arm_sme::populateVectorTransferLoweringPatterns(
	LLVMTypeConverter &converter, RewritePatternSet &patterns) {
	patterns.add<TransferWriteToArmSMEZeroLowering>(converter);
	}

mlir/test/Dialect/ArmSME/roundtrip.mlir

Show All 17 Lines

// -----

func.func @arm_sme_get_tile_id() -> i32 {

// CHECK: arm_sme.get_tile_id : i32

%0 = arm_sme.get_tile_id : i32

return %0 : i32

}

// -----

func.func @arm_sme_store_tile(%tile : vector<[16]x[16]xi8>, %dest: memref<?x?xi8>) -> () {

c-rhodesUnsubmitted

Not Done

// -----

- func.func @arm_sme_store_tile(%tile : vector<[16]x[16]xi8>, %dest: memref<?x?xi8>) -> () {

+ func.func @arm_sme_store_tile(%tile : vector<[16]x[16]xi8>, %dest : memref<?x?xi8>) -> () {

// CHECK: arm_sme.tile_store {{.*}} : vector<[16]x[16]xi8>, memref<?x?xi8>

nit: space before ":" for consistency

c-rhodes: nit: space before ":" for consistency

// CHECK: arm_sme.tile_store {{.*}} : memref<?x?xi8>, vector<[16]x[16]xi8>

%c0 = arith.constant 0 : index

arm_sme.tile_store %dest[%c0, %c0], %tile : memref<?x?xi8>, vector<[16]x[16]xi8>

c-rhodesUnsubmitted

Not Done

hasn't the operand order been changed so this comes first? Surprised this test passed

c-rhodes: hasn't the operand order been changed so this comes first? Surprised this test passed

awarzynskiAuthorUnsubmitted

Done

I've not changed the assembly format yet ;-)

awarzynski: I've not changed the assembly format yet ;-)

return

}

// -----

func.func @arm_sme_zero() -> () {

// CHECK: arm_sme.zero : vector<[16]x[16]xi8>

%0 = arm_sme.zero : vector<[16]x[16]xi8>

return

}

mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir

This file was added.

				// RUN: mlir-opt %s -convert-vector-to-arm-sme -convert-vector-to-llvm="enable-arm-sme" -split-input-file \| mlir-opt \| FileCheck %s

				// CHECK-LABEL: @transfer_write_2d_zero_i8
				// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)
				// CHECK-NEXT: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
				// CHECK: %[[C255:.*]] = arith.constant 255 : i32
				// CHECK-NEXT: "arm_sme.intr.zero"(%[[C255]]) : (i32) -> ()
				// CHECK-NEXT: %[[C1:.*]] = arith.constant 1 : index
				// CHECK-NEXT: %[[MIN_ZA_VECTORS:.*]] = arith.constant 16 : index
				// CHECK-NEXT: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64
				c-rhodesUnsubmitted Not Done Reply Inline Actions the order here is important, should we be using CHECK-DAG? c-rhodes: the order here is important, should we be using CHECK-DAG?
				awarzynskiAuthorUnsubmitted Done Reply Inline Actions These will always be ordered correctly as there is a dependency expressed via `TILE_ID`: // CHECK-DAG: %[[TILE_ID:.]] = arm_sme.get_tile_id : i8 // CHECK-DAG: %[[CAST_TO_VECTOR:.]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i8 to vector<[16]x[16]xi8> So I think that it should be OK. awarzynski: These will always be ordered correctly as there is a dependency expressed via `TILE_ID`: ``` //…
				// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index
				// CHECK-NEXT: %[[C0_0:.*]] = arith.constant 0 : index
				// CHECK-NEXT: %[[NUM_ZA_VECTORS:.*]] = arith.muli %[[MIN_ZA_VECTORS]], %[[VSCALE_IDX]] : index
				// CHECK-NEXT: scf.for %[[VNUM:.*]] = %[[C0_0]] to %[[NUM_ZA_VECTORS]] step %[[C1]] {
				// CHECK-NEXT: %[[VNUM_I64:.*]] = arith.index_castui %[[VNUM]] : index to i64
				// CHECK-NEXT: %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
				// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
				// CHECK-NEXT: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
				// CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[VNUM_I64]], %[[STRIDE0]] : i64
				// CHECK-NEXT: %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C0_1]] : i64
				// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
				// CHECK-NEXT: %[[VNUM_I32:.*]] = arith.index_castui %[[VNUM]] : index to i32
				// CHECK-NEXT: "arm_sme.intr.str"(%[[VNUM_I32]], %[[GEP]]) : (i32, !llvm.ptr) -> ()
				func.func @transfer_write_2d_zero_i8(%arg0 : memref<?x?xi8>) {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
				vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
				return
				}

mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir

This file was moved from mlir/test/Dialect/ArmSME/vector-ops.mlir.

	// RUN: mlir-opt %s -convert-vector-to-llvm="enable-arm-sme" -split-input-file \| mlir-opt \| FileCheck %s			// RUN: mlir-opt %s -convert-vector-to-arm-sme -split-input-file \| mlir-opt \| FileCheck %s

	// CHECK-LABEL: @transfer_write_2d_zero_i8
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)			// CHECK-LABEL: func.func @transfer_write_2d_zero(
	// CHECK-NEXT: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>			// CHECK-SAME: %[[ARG_0:.*]]: memref<?x?xi8>) {
	// CHECK: %[[C255:.*]] = arith.constant 255 : i32			func.func @transfer_write_2d_zero(%arg0 : memref<?x?xi8>) {
	// CHECK-NEXT: "arm_sme.intr.zero"(%[[C255]]) : (i32) -> ()			// CHECK: %[[C_0:.*]] = arith.constant 0 : index
	// CHECK-NEXT: %[[C1:.*]] = arith.constant 1 : index			// CHECK: %[[ZERO:.*]] = arm_sme.zero : vector<[16]x[16]xi8>
	// CHECK-NEXT: %[[MIN_ZA_VECTORS:.*]] = arith.constant 16 : index			// CHECK: arm_sme.tile_store %[[ARG_0]][%[[C_0]], %[[C_0]]], %[[ZERO]] : memref<?x?xi8>, vector<[16]x[16]xi8>
	// CHECK-NEXT: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64			// CHECK: return
	// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index
	// CHECK-NEXT: %[[C0_0:.*]] = arith.constant 0 : index
	// CHECK-NEXT: %[[NUM_ZA_VECTORS:.*]] = arith.muli %[[MIN_ZA_VECTORS]], %[[VSCALE_IDX]] : index
	// CHECK-NEXT: scf.for %[[VNUM:.*]] = %[[C0_0]] to %[[NUM_ZA_VECTORS]] step %[[C1]] {
	// CHECK-NEXT: %[[VNUM_I64:.*]] = arith.index_castui %[[VNUM]] : index to i64
	// CHECK-NEXT: %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
	// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-NEXT: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[VNUM_I64]], %[[STRIDE0]] : i64
	// CHECK-NEXT: %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C0_1]] : i64
	// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
	// CHECK-NEXT: %[[VNUM_I32:.*]] = arith.index_castui %[[VNUM]] : index to i32
	// CHECK-NEXT: "arm_sme.intr.str"(%[[VNUM_I32]], %[[GEP]]) : (i32, !llvm.ptr) -> ()
	func.func @transfer_write_2d_zero_i8(%arg0 : memref<?x?xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<0> : vector<[16]x[16]xi8>			%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
	vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>			vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
	return			return
	}			}

	// -----			// -----

	// The following tests check the 'vector.transfer_write' -> 'arm_sme.intr.zero'			// The following tests check the 'vector.transfer_write' -> 'arm_sme.intr.zero'
	// lowering only occurs for vector types of correct rank, shape, element size			// lowering only occurs for vector types of correct rank, shape, element size
	// and number of scalable dims.			// and number of scalable dims.

	// CHECK-LABEL: @transfer_write_2d_zero__bad_type			// CHECK-LABEL: @transfer_write_2d_zero__bad_type
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	// CHECK-NOT: arm_sme.intr.zero
	c-rhodesUnsubmitted Not Done Reply Inline Actions i think we should keep a CHECK-NOT? c-rhodes: i think we should keep a CHECK-NOT?
	awarzynskiAuthorUnsubmitted Done Reply Inline Actions Removed by accident, ta! awarzynski: Removed by accident, ta!
	func.func @transfer_write_2d_zero__bad_type(%arg0 : memref<?x?xi4>) {			func.func @transfer_write_2d_zero__bad_type(%arg0 : memref<?x?xi4>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<0> : vector<[16]x[16]xi4>			%cst = arith.constant dense<0> : vector<[16]x[16]xi4>
	vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi4>, memref<?x?xi4>			vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi4>, memref<?x?xi4>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @transfer_write_2d_zero__bad_shape			// CHECK-LABEL: @transfer_write_2d_zero__bad_shape
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	// CHECK-NOT: arm_sme.intr.zero			// CHECK-NOT: arm_sme.tile_store
	func.func @transfer_write_2d_zero__bad_shape(%arg0 : memref<?x?xi8>) {			func.func @transfer_write_2d_zero__bad_shape(%arg0 : memref<?x?xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<0> : vector<[8]x[8]xi8>			%cst = arith.constant dense<0> : vector<[8]x[8]xi8>
	vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[8]x[8]xi8>, memref<?x?xi8>			vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[8]x[8]xi8>, memref<?x?xi8>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @transfer_write_2d_zero__bad_rank			// CHECK-LABEL: @transfer_write_2d_zero__bad_rank
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	// CHECK-NOT: arm_sme.intr.zero			// CHECK-NOT: arm_sme.tile_store
	func.func @transfer_write_2d_zero__bad_rank(%arg0 : memref<?x?x?xi8>) {			func.func @transfer_write_2d_zero__bad_rank(%arg0 : memref<?x?x?xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<0> : vector<[16]x[16]x[16]xi8>			%cst = arith.constant dense<0> : vector<[16]x[16]x[16]xi8>
	vector.transfer_write %cst, %arg0[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<[16]x[16]x[16]xi8>, memref<?x?x?xi8>			vector.transfer_write %cst, %arg0[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<[16]x[16]x[16]xi8>, memref<?x?x?xi8>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @transfer_write_2d_zero__non_memref_type			// CHECK-LABEL: @transfer_write_2d_zero__non_memref_type
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	// CHECK-NOT: arm_sme.intr.zero			// CHECK-NOT: arm_sme.tile_store
	func.func @transfer_write_2d_zero__non_memref_type(%arg0 : tensor<?x?xi8>) -> tensor<?x?xi8> {			func.func @transfer_write_2d_zero__non_memref_type(%arg0 : tensor<?x?xi8>) -> tensor<?x?xi8> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<0> : vector<[16]x[16]xi8>			%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
	%0 = vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, tensor<?x?xi8>			%0 = vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, tensor<?x?xi8>
	return %0 : tensor<?x?xi8>			return %0 : tensor<?x?xi8>
	}			}

	// -----			// -----

	// CHECK-LABEL: @transfer_write_2d_zero__non_zero_value			// CHECK-LABEL: @transfer_write_2d_zero__non_zero_value
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	// CHECK-NOT: arm_sme.intr.zero			// CHECK-NOT: arm_sme.tile_store
	func.func @transfer_write_2d_zero__non_zero_value(%arg0 : memref<?x?xi8>) {			func.func @transfer_write_2d_zero__non_zero_value(%arg0 : memref<?x?xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<1> : vector<[16]x[16]xi8>			%cst = arith.constant dense<1> : vector<[16]x[16]xi8>
	vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>			vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @transfer_write_2d_zero__vec_unknown_defining_op			// CHECK-LABEL: @transfer_write_2d_zero__vec_unknown_defining_op
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	// CHECK-NOT: arm_sme.intr.zero			// CHECK-NOT: arm_sme.tile_store
	func.func @transfer_write_2d_zero__vec_unknown_defining_op(%arg0 : memref<?x?xi8>, %arg1 : vector<[16]x[16]xi8>) {			func.func @transfer_write_2d_zero__vec_unknown_defining_op(%arg0 : memref<?x?xi8>, %arg1 : vector<[16]x[16]xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.transfer_write %arg1, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>			vector.transfer_write %arg1, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
	return			return
	}			}

mlir/test/Dialect/ArmSME/vector-ops.mlir

This file was moved to mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir.

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir

	// RUN: mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \			// RUN: mlir-opt %s -convert-vector-to-arm-sme -enable-arm-streaming="mode=locally enable-za" \
	// RUN: -convert-vector-to-llvm="enable-arm-sme" -test-lower-to-llvm \| \			// RUN: -convert-vector-to-llvm="enable-arm-sme" -test-lower-to-llvm \| \
	// RUN: mlir-translate -mlir-to-llvmir \| \			// RUN: mlir-translate -mlir-to-llvmir \| \
	// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \			// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \
	// RUN: --entry-function=entry \			// RUN: --entry-function=entry \
	// RUN: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \| \			// RUN: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() -> i32 {			func.func @entry() -> i32 {
	▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ArmSME] Introduce custom ops for SMEClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 539190

mlir/include/mlir/Conversion/Passes.h

mlir/include/mlir/Conversion/Passes.td

mlir/include/mlir/Conversion/VectorToSME/VectorToSME.h

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.h

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td

mlir/lib/Conversion/CMakeLists.txt

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp

mlir/lib/Conversion/VectorToSME/CMakeLists.txt

mlir/lib/Conversion/VectorToSME/VectorToSME.cpp

mlir/lib/Conversion/VectorToSME/VectorToSMEPass.cpp

mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt

mlir/lib/Dialect/ArmSME/Transforms/CMakeLists.txt

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp

mlir/test/Dialect/ArmSME/roundtrip.mlir

mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir

mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir

mlir/test/Dialect/ArmSME/vector-ops.mlir

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir

[mlir][ArmSME] Introduce custom ops for SME
ClosedPublic