This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Conversion/
-
Passes.h
1/5
Passes.td
-
VectorToArmSME/
-
VectorToArmSME.h
-
Dialect/ArmSME/IR/
-
ArmSME/
-
IR/
-
ArmSME.h
12/23
ArmSME.td
-
lib/
-
Conversion/
1/1
CMakeLists.txt
-
VectorToArmSME/
1/1
CMakeLists.txt
7/10
VectorToArmSME.cpp
1/3
VectorToArmSMEPass.cpp
-
VectorToLLVM/
1/1
ConvertVectorToLLVMPass.cpp
-
Dialect/ArmSME/
-
ArmSME/
-
IR/
1/1
CMakeLists.txt
-
Transforms/
-
CMakeLists.txt
5/22
LegalizeForLLVMExport.cpp
-
LowerVectorOps.cpp
-
test/
-
Dialect/ArmSME/
-
ArmSME/
1/3
roundtrip.mlir
1/2
vector-ops-to-llvm.mlir
1/2
vector-ops-to-sme.mlir
-
vector-ops.mlir
-
Integration/Dialect/Vector/CPU/ArmSME/
-
Dialect/
-
Vector/
-
CPU/
-
ArmSME/
-
vector-ops.mlir

Differential D154867

[mlir][ArmSME] Introduce custom ops for SME
ClosedPublic

Authored by awarzynski on Jul 10 2023, 10:36 AM.

Download Raw Diff

Details

Reviewers

dcaballe
c-rhodes
WanderAway
aartbik
ftynse
nicolasvasilache

Commits

rG447bb5bee402: [mlir][ArmSME] Introduce new lowering layer (Vector -> ArmSME)

Summary

This patch introduces a new lowering layer between the Vector dialect
and the Arm SME extension. At the moment, the lowering from the Vector
dialect to SME looks like this:

Vector --> SME LLVM IR intrinsics

This patch introduces custom SME ops, so the lowering will look like
this:

Vector --> ArmSME dialect (custom Ops) --> SME LLVM IR intrinsics.

This is motivated by 2 considerations:

Storing ZA to memory (e.g. vector.transfer_write) requires an scf.for loop over all rows of ZA. Similar logic will apply to "load to ZA from memory". This is a rather complex transformation and a custom Op seems justified.
As discussed in [1], we need to prevent the LLVM type converter from having to convert types unsupported in LLVM, e.g. vector<[16]x[16]xi8>. A dedicated abstraction layer with custom Ops opens a path to some fine tuning (e.g. custom type converters) that will allow us to avoid this.

This patch introduces two SME Ops: TileStoreOp and ZeroOp. Note that
no new functionality is added - these Ops merely model what's already
supported. In particular, the following tile size is assumed (dimension
and element size are fixed):

vector<[16]x[16]xi8>

The new lowering layer is introduced via a conversion pass between the
Vector and the SME dialects. You can use the -convert-vector-to-sme
flag to run it. The following function:

func.func @example(%arg0 : memref<?x?xi8>) {
  // (...)
  %cst = arith.constant dense<0> : vector<[16]x[16]xi8>
  vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8>
  return
}

would be lowered to:

func.func @example(%arg0: memref<?x?xi8>) {
  // (...)
  %0 = arm_sme.zero : vector<[16]x[16]xi8>
  arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>
  return
}

Later, a mechanism will be introduced to guarantee that arm_sme.zero
and arm_sme.tile_store operate on the same virtual tile. For i8
elements this is not required as there is only one tile.

In order to lower the above output to LLVM, use

-convert-vector-to-llvm="enable-arm-sme".

[1] https://github.com/openxla/iree/issues/14294

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

awarzynski created this revision.Jul 10 2023, 10:36 AM

Herald added a reviewer: aartbik. · View Herald TranscriptJul 10 2023, 10:36 AM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: gysit, Dinistro, bviyer and 24 others. · View Herald Transcript

awarzynski requested review of this revision.Jul 10 2023, 10:36 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJul 10 2023, 10:36 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

This is mostly just moving some code around (while trying to address the issues listed in the summary). I'm sending this for early feedback to see what others think. I'll add more tests if this is the desired direction :)

@c-rhodes Would this play nicely with the updates that you are working on? (i.e. to support element sizes other than i8)

Hi Andrzej, thanks for looping me in, I just have one comment regarding the lowering of the ZeroOp.

Of course since you guys are working on the implementation, feel free to ignore my comments if you have something already planned for it.

-Frank

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
60	nit: Since this operation stores the entire ZA tile as opposed to more useful (virtual) tiles, it seems to be more appropriate to name this to `save` or `spill` or something similar? I think it makes sense to distinguish this version (using `str`) with other stores leveraging the `st1*` instructions
mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
76 ↗	(On Diff #538730)	I feel like it makes a bit more sense to lower to `ZeroOp` from an `arith::ConstantOp` instead of bundling it with `TransferWrite`? It may be a good idea to add a a verifier to make sure `ZeroOp`s are consumed by only SME-compatible ops in the future?

tschuett added a subscriber: tschuett.Jul 10 2023, 11:58 AM

tschuett added inline comments.

mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
79 ↗	(On Diff #538730)	If I read this correctly, TileStoreOp does not depend on ZeroOp? Thus, I can rearrange the order of them?

Harbormaster completed remote builds in B244209: Diff 538730.Jul 10 2023, 12:07 PM

In D154867#4486211, @WanderAway wrote:

Of course since you guys are working on the implementation, feel free to ignore my comments if you have something already planned for it.

Your feedback is greatly appreciated! I agree with your points, I am just wondering whether to address them in this patch. It's already quite large 🤔 .

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
60	Well, the idea is to iterate this design and to make this Op spill an SME virtual tile - once we can specify tile ID :) (this should happen soon) I think that are right that it would be good to have a dedicated Op for spilling the whole array, but I will refrain from renaming just now.
mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
76 ↗	(On Diff #538730)	I feel like it makes a bit more sense to lower to ZeroOp from an arith::ConstantOp instead of bundling it with TransferWrite? We bundled arith::ConstantOp with TransferWrite because `zero` feels a bit pointless if `transfer_write` cannot be lowered to SME (e.g. because the destination is not a `memref`). Also, we wanted to demonstrate end-to-end example and leave the finer details for later (i.e. "now"). You are making a very good point though - the current approach won't make sense once we try to fill `ZA` with something other than 0. I'd rather do it in a separate patch (it will require a few other changes too).
79 ↗	(On Diff #538730)	Yeah, good catch! I was trying to work around the type conversion issue by not having any inputs/outputs, but that wont' scale to other element types. I will be updating this shortly.

Added input to TileStoreOp and output to ZeroOp. This means that the
following:

func.func @example(%arg0: memref<?x?xi8>) {
  // (...)
  arm_sme.zero
  arm_sme.tile_store %arg0 : memref<?x?xi8>
  return
}

becomes:

func.func @example(%arg0: memref<?x?xi8>) {
  // (...)
  %0 = arm_sme.zero : vector<[16]x[16]xi8>
  arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>
  return
}

With this update the type conversion issue becomes a bit trickier to address, but
at least the data flow is much easier to reason about.

Summary of other changes:

Rebase on top of D154941 (leverage the newly introduced Ops)
Remove LowerVectorOps.cpp (it's not needed anymore)
More comments, documentation and tests
Make the new Ops consume/return values (this is now possible with new ops from D154941)

Harbormaster completed remote builds in B244486: Diff 539115.Jul 11 2023, 8:14 AM

awarzynski edited the summary of this revision. (Show Details)Jul 11 2023, 8:15 AM

awarzynski added a parent revision: D154941: [mlir][ArmSME] Add custom get_tile_id and cast ops.

I am just wondering whether to address them in this patch. It's already quite large 🤔 .

Makes sense to me, I'm fine with a separate patch to address these issues.

This revision is now accepted and ready to land.Jul 11 2023, 8:21 AM

awarzynski added a child revision: D154302: [mlir][nfc] Clarify the limitation on scalable vectors.Jul 11 2023, 8:55 AM

awarzynski mentioned this in D154302: [mlir][nfc] Clarify the limitation on scalable vectors.Jul 11 2023, 8:56 AM

Thanks Andzej, I've left some comments and also noticed mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir is failing for me when I tried your patch, please could you check if it also fails for you?

mlir/include/mlir/Conversion/Passes.td
1080
1083	I think we should add arm since that's the full name of the dialect, would also apply to comments/filenames.
1090–1092
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
200	nit: american spellings (unfortunately 😢)
200–202
213	nit: indentation
235	nit: indentation
240
245–247	nit: indentation
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
99	nit: unrelated change
mlir/lib/Conversion/VectorToSME/CMakeLists.txt
13 ↗	(On Diff #538730)	this failed to compile for me with undefined symbol errors, I think you're missing this library and `MLIRVectorDialect` in `mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt`?
mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
23 ↗	(On Diff #539115)	I think we would want vector namespace to make it clear which ops are vector vs SME?
78–79 ↗	(On Diff #539115)	I think we could use replaceOpWithNewOp here, apologies if I didnt use that
81 ↗	(On Diff #539115)	I think rewriter will take care of removing this if it's dead after replacing the store?
mlir/lib/Conversion/VectorToSME/VectorToSMEPass.cpp
9–22 ↗	(On Diff #539115)	quite a few of these aren't used
50–52 ↗	(On Diff #539115)	are these necessary?
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
200	nit: american spelling
204	nit: indentation
209–210	nit: indentation
223–232	the cast op should create created after the intrinsic since it represents the tile loaded by the preceding intrinsic
243–244	nit: indentation
249–254	nit: indentation

Thanks for reviewing @c-rhodes !

In D154867#4489909, @c-rhodes wrote:

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir is failing for me when I tried your patch,

That's because -convert-vector-to-arm-sme was missing, sorry about that :(

mlir/include/mlir/Conversion/Passes.td
1083	To be perfectly honest, I feel that shorter names are better and in general feel that `Arm{NEON\|SVE\|SME}` should be renamed as `{NEON\|SVE\|SME}`. But I agree that in the meantime we should prioritise consistency.
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
200	AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to change though.
mlir/lib/Conversion/VectorToSME/CMakeLists.txt
13 ↗	(On Diff #538730)	Thanks for catching this - I had shared libs turned off.
mlir/lib/Conversion/VectorToSME/VectorToSME.cpp
23 ↗	(On Diff #539115)	Agreed.
mlir/lib/Conversion/VectorToSME/VectorToSMEPass.cpp
9–22 ↗	(On Diff #539115)	Thanks! What do you use to find unused headers? I've updated my Vim LSP recently and haven't had a chance to restore that functionality yet :(

awarzynski updated this revision to Diff 539190.Jul 11 2023, 10:46 AM

Addressing comments from Cullen

Fixed CMake
Fixed test
Fixed formatting

Herald added subscribers: ThomasRaoux, jsetoain. · View Herald TranscriptJul 11 2023, 10:46 AM

Harbormaster completed remote builds in B244536: Diff 539190.Jul 11 2023, 3:37 PM

Rename "toSME" --> "toArmSME" (variable + file names)

Also removed more "unused" headers and simplified the new pass.

c-rhodes added inline comments.Jul 12 2023, 1:50 AM

mlir/include/mlir/Conversion/Passes.td
1083	To be perfectly honest, I feel that shorter names are better and in general feel that `Arm{NEON\|SVE\|SME}` should be renamed as `{NEON\|SVE\|SME}`. But I agree that in the meantime we should prioritise consistency. There's a good reason for keeping Arm in the name, I think NEON has been around long enough for people to recognise it as an Arm technology, but SVE/SME have generic names and we have to be cognisant most people probably don't know what they are, 3 extra characters to add clarity seems like a small price to pay to me.
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
200	AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to change though. Ah ok, coming from LLVM/Clang I thought American spellings were standard, apologies if that's not the case.
236	nit: indentation
237–247	nit: indentation
mlir/lib/Conversion/VectorToArmSME/CMakeLists.txt
15	I think this can be removed?
mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
2	filename needs updating here
11	nit: empty line
15–16	unused?
mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp
11	nit: empty line
mlir/lib/Conversion/VectorToSME/VectorToSMEPass.cpp
9–22 ↗	(On Diff #539115)	Thanks! What do you use to find unused headers? I've updated my Vim LSP recently and haven't had a chance to restore that functionality yet :( I don't use a tool, I checked out your patch looked at the code and removed ones I couldnt see were used then verified by compiling.
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
223–232	Please could you look at this again, the cast is still created before the intrinsic.
224	getVectorType?
262–264	the cast can be removed

awarzynski mentioned this in D154941: [mlir][ArmSME] Add custom get_tile_id and cast ops.Jul 12 2023, 1:51 AM

Thanks Cullen! I will be sending an update shortly.

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
11	AFAIK, there are no code style rules for this sort of things apart from: The Main Module Header file applies to .cpp files which implement an interface defined by a .h file. This #include should always be included first regardless of where it lives on the file system. And keeping an empty line between the main module include and other header files is quite common in MLIR: https://github.com/llvm/llvm-project/blob/60c9d2993bbf1594e89e1e6f72e1472eb1aeb8ef/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp#L13-L14

Incorporate the latests suggestions from Cullen

Harbormaster completed remote builds in B244702: Diff 539433.Jul 12 2023, 3:58 AM

Sorry for the delay. Some comments, most of them nits.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
199	What are the side effects of this op?
200	ZA -> ZA tile/tile register/register?
216–218	Curious... `getType` or `getResultType` (or similar ones, auto-generated) should return a `VectorType` if `nxnxv...` are defined as vectors. Isn't that the case? Do we need this method for some other reason then?
223	This one should at least have memory side effects
234	Would it make sense to align the operand order with the rest of store ops in MLIR? I.e., `value-to-store, dst-memref [indices] : vector-type, memref-type`?
mlir/lib/Conversion/CMakeLists.txt
55	sort
mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
38	missing operand and types?
39	nit: TransferWriteToArmSMELowering?
58	if `memRefType` is not used beyond the condition you should use `isa` instead of `dyn_cast`
mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp
28–30	We should move the dependencies to the .td file. There is a way to have them defined there and have the code autogenerated.
mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt
14	sort
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
210	add getTileId? Otherwise, it's not clear where %1 is coming from
223–232	Yes, you should be able to do `rewriter.replaceOpWithNewOp(zero, ...`.
233	Ok, I see what you are trying to do here... and can't think of a better way. This is more like propagating information (getTileId) across different op converters but through the IR. I think I tried to do something similar by introducing a state in the converters but I barely remember. I'm ok with this.
239	Something important here: we introduce the SME lowering layer to explicitly model what is needed for SME and make the conversion to LLVM easier. However, here we are materializing a loop. I'm wondering why that loop is not generated when we move from Vector to the SME dialect and then the conversion to LLVM is mostly a 1:1 translation to the intrinsics.

c-rhodes added inline comments.Jul 13 2023, 1:14 AM

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
11	AFAIK, there are no code style rules for this sort of things apart from: The Main Module Header file applies to .cpp files which implement an interface defined by a .h file. This #include should always be included first regardless of where it lives on the file system. And keeping an empty line between the main module include and other header files is quite common in MLIR: https://github.com/llvm/llvm-project/blob/60c9d2993bbf1594e89e1e6f72e1472eb1aeb8ef/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp#L13-L14 Hadn't noticed that, thanks for pointing that out
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
239	Something important here: we introduce the SME lowering layer to explicitly model what is needed for SME and make the conversion to LLVM easier. However, here we are materializing a loop. I'm wondering why that loop is not generated when we move from Vector to the SME dialect and then the conversion to LLVM is mostly a 1:1 translation to the intrinsics. I've also been thinking about this, the load/stores in SME operate on ZA array vectors or tile slices, which are 1-d scalable vectors of SVL bits, rather than an entire tile, hence the loop materialization. Perhaps if we had custom ops that deal with tile vectors the loop could be emitted when going from Vector -> SME and these would later map 1-1 with LLVM intrinsics. We'll consider what we can do here, thanks for raising this.

Thanks for reviewing!

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
216–218	No, it returns an abstract `Type` that you then have to cast to Vector. At least that's what I'm seeing 🤔 .
223	See `[MemWrite]` on L238: let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base, Variadic<Index>:$indices, nxnxv16i8:$valueToStore);
234	Good shout! I will align this with `Vector_StoreOp`.
mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp
28–30	Annoyingly, that's already there :) Good catch, thanks!
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
223–232	Apologies @c-rhodes , I missed this comment. Will be updating shortly. the cast op should create created after the intrinsic since it represents the tile loaded by the preceding intrinsic Do you think that the order will matter in practice? Otherwise somebody could just rewrite your suggestion as: the cast op should create created after the intrinsic since it represents the tile loaded by the _following_ intrinsic IIUC, the order does not matter, but might be missing something? Regardless, we should definitely make sure that we are consistent and I am happy with "after" (i.e. your suggestion).
239	Good points, thanks! Now that you have raised this I see that this abstraction should be re-fined. Is it OK to iterate in future patches though? There's a few other patches that depend on one another, so I would land this as is and refactor separately. My main goal is to get the overall scaffolding in first (i.e. the "Vector to SME" pass). WDYT?

Incorporate suggestions from Diego, thanks!

Harbormaster completed remote builds in B245043: Diff 539935.Jul 13 2023, 3:46 AM

c-rhodes added inline comments.Jul 13 2023, 3:47 AM

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
39	I noticed other rewrites are in an empty namespace, do we need one here?

Add an anonymous namespace

Harbormaster completed remote builds in B245065: Diff 539959.Jul 13 2023, 4:48 AM

awarzynski added inline comments.Jul 13 2023, 4:48 AM

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
39	Done :)

thanks for the updates Andrzej this is really taking shape, just a few more comments :)

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
237	example needs updating now the operand order has changed
242	nit: move to above line or indent to make it clear it applies to the memref
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
165–202	move this to bottom alongside `populateArmSMELegalizeForLLVMExportPatterns`?
227–228	this should be created before the zero, and we should add a note that get_tile_id and zero aren't chain together yet
247–252	nit: the variable names could be improved, %3 -> %vscale for example
mlir/test/Dialect/ArmSME/roundtrip.mlir
176	hasn't the operand order been changed so this comes first? Surprised this test passed
mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir
39	i think we should keep a CHECK-NOT?

Update the assembly format for TileStoreOp

Harbormaster completed remote builds in B245108: Diff 540016.Jul 13 2023, 7:14 AM

Thanks Cullen - that's a very thorough and much appreciated review! I've just updated the patch (before sending my replies), so my comments will be a bit out of sync, sorry.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
237	Argh, assembly format needs updating too. Please double check. I am trying to align with `VectorStoreOp`, but this looks off: assembly format for VectorStoreOp.
mlir/test/Dialect/ArmSME/roundtrip.mlir
176	I've not changed the assembly format yet ;-)
mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir
39	Removed by accident, ta!

Matt added a subscriber: Matt.Jul 13 2023, 2:37 PM

Just a couple minor comments but otherwise LGTM! Cheers

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
239	Good points, thanks! Now that you have raised this I see that this abstraction should be re-fined. Is it OK to iterate in future patches though? There's a few other patches that depend on one another, so I would land this as is and refactor separately. My main goal is to get the overall scaffolding in first (i.e. the "Vector to SME" pass). WDYT? Yeah that can be done separately.
mlir/test/Dialect/ArmSME/roundtrip.mlir
173	nit: space before ":" for consistency
mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir
9–10	the order here is important, should we be using CHECK-DAG?

c-rhodes added a child revision: D155306: [mlir][ArmSME] Add tile load op and extend tile store tile size support.Jul 14 2023, 9:01 AM

awarzynski added a child revision: D155365: [mlir][ArmSME] Introduce custom TypeConverter for ArmSME.Jul 15 2023, 4:20 AM

awarzynski removed a child revision: D154302: [mlir][nfc] Clarify the limitation on scalable vectors.Jul 15 2023, 4:24 AM

awarzynski added inline comments.Jul 17 2023, 7:51 AM

mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir
9–10	These will always be ordered correctly as there is a dependency expressed via `TILE_ID`: // CHECK-DAG: %[[TILE_ID:.]] = arm_sme.get_tile_id : i8 // CHECK-DAG: %[[CAST_TO_VECTOR:.]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i8 to vector<[16]x[16]xi8> So I think that it should be OK.

Update the assembly format for arm_sme.tile_store to match vector.store:

arm_sme.tile_store %tile, %dest[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>

rather than:

arm_sme.tile_store %tile, %dest[%c0, %c0] : vector<[16]x[16]xi8>, memref<?x?xi8>

Harbormaster completed remote builds in B245858: Diff 541035.Jul 17 2023, 7:54 AM

Thanks for addressing the comments! LGTM!

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
239	It sounds good to me to do this separately but this is a big abstraction change so hopefully we can do it sooner than later. If you think the non-loop abstraction is also useful, we could also have two level of abtractions within the same dialect, where we go first to the non-loop one and then materialize the loop at some point within the SME dialect. The Vector dialect is a good example of this.

This revision was landed with ongoing or failed builds.Jul 18 2023, 1:07 AM

Closed by commit rG447bb5bee402: [mlir][ArmSME] Introduce new lowering layer (Vector -> ArmSME) (authored by awarzynski). · Explain Why

This revision was automatically updated to reflect the committed changes.

awarzynski added a commit: rG447bb5bee402: [mlir][ArmSME] Introduce new lowering layer (Vector -> ArmSME).

c-rhodes added inline comments.Jul 27 2023, 11:09 AM

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
239	It sounds good to me to do this separately but this is a big abstraction change so hopefully we can do it sooner than later. If you think the non-loop abstraction is also useful, we could also have two level of abtractions within the same dialect, where we go first to the non-loop one and then materialize the loop at some point within the SME dialect. The Vector dialect is a good example of this. I've shared an update on Discourse: https://discourse.llvm.org/t/loop-materialization-in-armsme/72354 And a solution in D156467

GitHub <noreply@github.com> mentioned this in rG0e06694235bf: [mlir][ArmSME][NFC] Remove arm_sme::populateVectorTransferLoweringPatterns decl….Thu, Dec 14, 2:51 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

Passes.h

1 line

Passes.td

14 lines

VectorToArmSME/

VectorToArmSME.h

26 lines

Dialect/

ArmSME/

IR/

ArmSME.h

1 line

ArmSME.td

60 lines

lib/

Conversion/

CMakeLists.txt

1 line

VectorToArmSME/

CMakeLists.txt

14 lines

VectorToArmSME.cpp

84 lines

VectorToArmSMEPass.cpp

36 lines

VectorToLLVM/

ConvertVectorToLLVMPass.cpp

1 line

Dialect/

ArmSME/

IR/

CMakeLists.txt

1 line

Transforms/

CMakeLists.txt

1 line

LegalizeForLLVMExport.cpp

113 lines

LowerVectorOps.cpp

test/

Dialect/

ArmSME/

roundtrip.mlir

17 lines

vector-ops-to-llvm.mlir

32 lines

	vector-ops-to-sme.mlir
	vector-ops.mlir

42 lines

vector-ops.mlir

Integration/

Dialect/

Vector/

CPU/

ArmSME/

vector-ops.mlir

2 lines

Diff 541381

mlir/include/mlir/Conversion/Passes.h

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	#include "mlir/Conversion/SPIRVToLLVM/SPIRVToLLVMPass.h"			#include "mlir/Conversion/SPIRVToLLVM/SPIRVToLLVMPass.h"
	#include "mlir/Conversion/ShapeToStandard/ShapeToStandard.h"			#include "mlir/Conversion/ShapeToStandard/ShapeToStandard.h"
	#include "mlir/Conversion/TensorToLinalg/TensorToLinalgPass.h"			#include "mlir/Conversion/TensorToLinalg/TensorToLinalgPass.h"
	#include "mlir/Conversion/TensorToSPIRV/TensorToSPIRVPass.h"			#include "mlir/Conversion/TensorToSPIRV/TensorToSPIRVPass.h"
	#include "mlir/Conversion/TosaToArith/TosaToArith.h"			#include "mlir/Conversion/TosaToArith/TosaToArith.h"
	#include "mlir/Conversion/TosaToLinalg/TosaToLinalg.h"			#include "mlir/Conversion/TosaToLinalg/TosaToLinalg.h"
	#include "mlir/Conversion/TosaToSCF/TosaToSCF.h"			#include "mlir/Conversion/TosaToSCF/TosaToSCF.h"
	#include "mlir/Conversion/TosaToTensor/TosaToTensor.h"			#include "mlir/Conversion/TosaToTensor/TosaToTensor.h"
				#include "mlir/Conversion/VectorToArmSME/VectorToArmSME.h"
	#include "mlir/Conversion/VectorToGPU/VectorToGPU.h"			#include "mlir/Conversion/VectorToGPU/VectorToGPU.h"
	#include "mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h"			#include "mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h"
	#include "mlir/Conversion/VectorToSCF/VectorToSCF.h"			#include "mlir/Conversion/VectorToSCF/VectorToSCF.h"
	#include "mlir/Conversion/VectorToSPIRV/VectorToSPIRVPass.h"			#include "mlir/Conversion/VectorToSPIRV/VectorToSPIRVPass.h"

	namespace mlir {			namespace mlir {

	/// Generate the code for registering conversion passes.			/// Generate the code for registering conversion passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Conversion/Passes.h.inc"			#include "mlir/Conversion/Passes.h.inc"

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_CONVERSION_PASSES_H			#endif // MLIR_CONVERSION_PASSES_H

mlir/include/mlir/Conversion/Passes.td

Show First 20 Lines • Show All 1,071 Lines • ▼ Show 20 Lines let options = [

Option<"targetRank", "target-rank", "unsigned", /*default=*/"1", Option<"targetRank", "target-rank", "unsigned", /*default=*/"1",

"Target vector rank to which transfer ops should be lowered">, "Target vector rank to which transfer ops should be lowered">,

Option<"lowerTensors", "lower-tensors", "bool", /*default=*/"false", Option<"lowerTensors", "lower-tensors", "bool", /*default=*/"false",

"Lower transfer ops that operate on tensors"> "Lower transfer ops that operate on tensors">

]; ];

} }

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// VectorToArmSME

c-rhodesUnsubmitted

Not Done

//===----------------------------------------------------------------------===//

- // VectorToME

+ // VectorToSME

//===----------------------------------------------------------------------===//

c-rhodes:

//===----------------------------------------------------------------------===//

def ConvertVectorToArmSME : Pass<"convert-vector-to-arm-sme"> {

c-rhodesUnsubmitted

Not Done

//===----------------------------------------------------------------------===//

- def ConvertVectorToSME : Pass<"convert-vector-to-sme"> {

+ def ConvertVectorToSME : Pass<"convert-vector-to-arm-sme"> {

let summary = "Lower the operations from the vector dialect into the ArmSME "

I think we should add arm since that's the full name of the dialect, would also apply to comments/filenames.

c-rhodes: I think we should add arm since that's the full name of the dialect, would also apply to…

awarzynskiAuthorUnsubmitted

Done

To be perfectly honest, I feel that shorter names are better and in general feel that Arm{NEON|SVE|SME} should be renamed as {NEON|SVE|SME}. But I agree that in the meantime we should prioritise consistency.

awarzynski: To be perfectly honest, I feel that shorter names are better and in general feel that…

c-rhodesUnsubmitted

Not Done

To be perfectly honest, I feel that shorter names are better and in general feel that Arm{NEON|SVE|SME} should be renamed as {NEON|SVE|SME}. But I agree that in the meantime we should prioritise consistency.

There's a good reason for keeping Arm in the name, I think NEON has been around long enough for people to recognise it as an Arm technology, but SVE/SME have generic names and we have to be cognisant most people probably don't know what they are, 3 extra characters to add clarity seems like a small price to pay to me.

c-rhodes: > To be perfectly honest, I feel that shorter names are better and in general feel that…

let summary = "Lower the operations from the vector dialect into the ArmSME "

"dialect";

let description = [{

Pass that converts vector dialect operations into equivalent ArmSME dialect

operations.

}];

let dependentDialects = ["arm_sme::ArmSMEDialect"];

}

c-rhodesUnsubmitted

Not Done

operations.

}];

- let dependentDialects = [

- "arm_sme::ArmSMEDialect"

- ];

+ let dependentDialects = ["arm_sme::ArmSMEDialect"];

}

//===----------------------------------------------------------------------===//

c-rhodes:

//===----------------------------------------------------------------------===//

// VectorToLLVM // VectorToLLVM

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

def ConvertVectorToLLVMPass : Pass<"convert-vector-to-llvm"> { def ConvertVectorToLLVMPass : Pass<"convert-vector-to-llvm"> {

let summary = "Lower the operations from the vector dialect into the LLVM " let summary = "Lower the operations from the vector dialect into the LLVM "

"dialect"; "dialect";

let description = [{ let description = [{

▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

mlir/include/mlir/Conversion/VectorToArmSME/VectorToArmSME.h

This file was added.

				//===- VectorToArmSME.h - Convert vector to ArmSME dialect ----------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				#ifndef MLIR_CONVERSION_VECTORTOARMSME_VECTORTOARMSME_H_
				#define MLIR_CONVERSION_VECTORTOARMSME_VECTORTOARMSME_H_

				#include "mlir/IR/PatternMatch.h"

				namespace mlir {
				class Pass;

				#define GEN_PASS_DECL_CONVERTVECTORTOARMSME
				#include "mlir/Conversion/Passes.h.inc"

				/// Collect a set of patterns to lower Vector ops to ArmSME ops that map to LLVM
				/// intrinsics.
				void populateVectorToArmSMEPatterns(RewritePatternSet &patterns,
				MLIRContext &ctx);

				} // namespace mlir

				#endif // MLIR_CONVERSION_VECTORTOARMSME_VECTORTOARMSME_H_

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.h

	Show All 9 Lines
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_DIALECT_ARMSME_IR_ARMSME_H			#ifndef MLIR_DIALECT_ARMSME_IR_ARMSME_H
	#define MLIR_DIALECT_ARMSME_IR_ARMSME_H			#define MLIR_DIALECT_ARMSME_IR_ARMSME_H

	#include "mlir/Bytecode/BytecodeOpInterface.h"			#include "mlir/Bytecode/BytecodeOpInterface.h"
	#include "mlir/Dialect/SCF/IR/SCF.h"			#include "mlir/Dialect/SCF/IR/SCF.h"
				#include "mlir/Dialect/Vector/IR/VectorOps.h"
	#include "mlir/IR/BuiltinTypes.h"			#include "mlir/IR/BuiltinTypes.h"
	#include "mlir/IR/Dialect.h"			#include "mlir/IR/Dialect.h"
	#include "mlir/IR/OpDefinition.h"			#include "mlir/IR/OpDefinition.h"
	#include "mlir/Interfaces/SideEffectInterfaces.h"			#include "mlir/Interfaces/SideEffectInterfaces.h"

	#include "mlir/Dialect/ArmSME/IR/ArmSMEDialect.h.inc"			#include "mlir/Dialect/ArmSME/IR/ArmSMEDialect.h.inc"

	#define GET_OP_CLASSES			#define GET_OP_CLASSES
	#include "mlir/Dialect/ArmSME/IR/ArmSME.h.inc"			#include "mlir/Dialect/ArmSME/IR/ArmSME.h.inc"

	#endif // MLIR_DIALECT_ARMSME_IR_ARMSME_H			#endif // MLIR_DIALECT_ARMSME_IR_ARMSME_H

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td

Show All 27 Lines def ArmSME_Dialect : Dialect {

let description = [{ let description = [{

This dialect contains the definitions necessary to target Arm SME This dialect contains the definitions necessary to target Arm SME

scalable matrix operations. scalable matrix operations.

Sources: Sources:

https://developer.arm.com/documentation/ddi0616 https://developer.arm.com/documentation/ddi0616

https://developer.arm.com/documentation/ddi0602/2023-03/SME-Instructions https://developer.arm.com/documentation/ddi0602/2023-03/SME-Instructions

}]; }];

let dependentDialects = ["scf::SCFDialect"]; let dependentDialects = ["scf::SCFDialect", "vector::VectorDialect"];

} }

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// ArmSME type definitions // ArmSME type definitions

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

class SMETileType<Type datatype, list<int> dims, string description> class SMETileType<Type datatype, list<int> dims, string description>

: ShapedContainerType<[datatype], : ShapedContainerType<[datatype],

And<[IsVectorOfRankPred<[2]>, allDimsScalableVectorTypePred, And<[IsVectorOfRankPred<[2]>, allDimsScalableVectorTypePred,

IsVectorOfShape<dims>]>, IsVectorOfShape<dims>]>,

description>; description>;

def nxnxv16i8 : SMETileType<I8, [16, 16], "vector<[16]x[16]xi8>">; def nxnxv16i8 : SMETileType<I8, [16, 16], "vector<[16]x[16]xi8>">;

def nxnxv8i16 : SMETileType<I16, [8, 8 ], "vector<[8]x[8]xi16>">; def nxnxv8i16 : SMETileType<I16, [8, 8 ], "vector<[8]x[8]xi16>">;

def nxnxv4i32 : SMETileType<I32, [4, 4 ], "vector<[4]x[4]xi32>">; def nxnxv4i32 : SMETileType<I32, [4, 4 ], "vector<[4]x[4]xi32>">;

def nxnxv2i64 : SMETileType<I64, [2, 2 ], "vector<[2]x[2]xi64>">; def nxnxv2i64 : SMETileType<I64, [2, 2 ], "vector<[2]x[2]xi64>">;

def nxnxv1i128 : SMETileType<I128, [1, 1 ], "vector<[1]x[1]xi128>">; def nxnxv1i128 : SMETileType<I128, [1, 1 ], "vector<[1]x[1]xi128>">;

def nxnxv8f16 : SMETileType<F16, [8, 8 ], "vector<[8]x[8]xf16>">; def nxnxv8f16 : SMETileType<F16, [8, 8 ], "vector<[8]x[8]xf16>">;

def nxnxv8bf16 : SMETileType<BF16, [8, 8 ], "vector<[8]x[8]xbf16>">; def nxnxv8bf16 : SMETileType<BF16, [8, 8 ], "vector<[8]x[8]xbf16>">;

def nxnxv4f32 : SMETileType<F32, [4, 4 ], "vector<[4]x[4]xf32>">; def nxnxv4f32 : SMETileType<F32, [4, 4 ], "vector<[4]x[4]xf32>">;

def nxnxv2f64 : SMETileType<F64, [2, 2 ], "vector<[2]x[2]xf64>">; def nxnxv2f64 : SMETileType<F64, [2, 2 ], "vector<[2]x[2]xf64>">;

def SMETile : AnyTypeOf<[nxnxv16i8, nxnxv8i16, nxnxv4i32, nxnxv2i64, nxnxv1i128, def SMETile : AnyTypeOf<[nxnxv16i8, nxnxv8i16, nxnxv4i32, nxnxv2i64, nxnxv1i128,

WanderAwayUnsubmitted

Not Done

nit: Since this operation stores the entire ZA tile as opposed to more useful (virtual) tiles, it seems to be more appropriate to name this to save or spill or something similar? I think it makes sense to distinguish this version (using str) with other stores leveraging the st1* instructions

WanderAway: nit: Since this operation stores the entire ZA tile as opposed to more useful (virtual) tiles…

awarzynskiAuthorUnsubmitted

Done

Well, the idea is to iterate this design and to make this Op spill an SME virtual tile - once we can specify tile ID :) (this should happen soon)

I think that are right that it would be good to have a dedicated Op for spilling the whole array, but I will refrain from renaming just now.

awarzynski: Well, the idea is to iterate this design and to make this Op spill an SME virtual tile - once…

nxnxv8f16, nxnxv8bf16, nxnxv4f32, nxnxv2f64]>; nxnxv8f16, nxnxv8bf16, nxnxv4f32, nxnxv2f64]>;

// A type constraint that verifies the bitwidth of the scalar integer returned // A type constraint that verifies the bitwidth of the scalar integer returned

// from 'arm_sme.get_tile_id' matches the element bitwidth of a "virtual tile". // from 'arm_sme.get_tile_id' matches the element bitwidth of a "virtual tile".

def TileElementWidthMatchesTileID : TypesMatchWith< def TileElementWidthMatchesTileID : TypesMatchWith<

"`tile_id` has the same number of bits as elements in `vector`", "`tile_id` has the same number of bits as elements in `vector`",

"vector", "tile_id", "vector", "tile_id",

"IntegerType::get(" "IntegerType::get("

▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines let description = [{

%za0_q = arm_sme.get_tile_id : i128 %za0_q = arm_sme.get_tile_id : i128

``` ```

}]; }];

let results = (outs AnyTypeOf<[I8, I16, I32, I64, I128]>:$tile_id); let results = (outs AnyTypeOf<[I8, I16, I32, I64, I128]>:$tile_id);

let assemblyFormat = "attr-dict `:` type($tile_id)"; let assemblyFormat = "attr-dict `:` type($tile_id)";

} }

dcaballeUnsubmitted

Done

What are the side effects of this op?

dcaballe: What are the side effects of this op?

// Tile reset.

c-rhodesUnsubmitted

Not Done

def ZeroOp : ArmSME_Op<"zero"> {

- let summary = "Initialise ZA with 0s";

+ let summary = "Initialize ZA with 0s";

let results = (outs

nit: american spellings (unfortunately 😢)

c-rhodes: nit: american spellings (unfortunately 😢)

awarzynskiAuthorUnsubmitted

Done

AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to change though.

awarzynski: AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to…

c-rhodesUnsubmitted

Not Done

AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to change though.

Ah ok, coming from LLVM/Clang I thought American spellings were standard, apologies if that's not the case.

c-rhodes: > AFAIK, both spellings are OK as long as you are consistent within a single file? I'm happy to…

dcaballeUnsubmitted

Done

ZA -> ZA tile/tile register/register?

dcaballe: ZA -> ZA tile/tile register/register?

c-rhodesUnsubmitted

Not Done

def ZeroOp : ArmSME_Op<"zero"> {

let summary = "Initialise ZA with 0s";

- let results = (outs

- VectorOfRankAndType<[2], [I8]>:$res);

+ let results = (outs nxnxv16i8:$res);

let description = [{

c-rhodes:

def ZeroOp : ArmSME_Op<"zero", [Pure]> {

let summary = "Initialize the two-dimensional ZA array with 0s";

let results = (outs nxnxv16i8:$res);

let description = [{

Initialise ZA with 0. This operation is convenient wrapper for the SME

`zero` intrinsic and instruction.

NOTE: At the moment it is assumed that the element type is `i8` and that

there's only one "virtual tile".

Example:

c-rhodesUnsubmitted

Not Done

```mlir

- %0 = arm_sme.zero : vector<[16]x[16]xi8>

+ %0 = arm_sme.zero : vector<[16]x[16]xi8>

```

}];

let extraClassDeclaration = [{

nit: indentation

c-rhodes: nit: indentation

```mlir

%0 = arm_sme.zero : vector<[16]x[16]xi8>

```

}];

dcaballeUnsubmitted

Not Done

Curious... getType or getResultType (or similar ones, auto-generated) should return a VectorType if nxnxv... are defined as vectors. Isn't that the case? Do we need this method for some other reason then?

dcaballe: Curious... `getType` or `getResultType` (or similar ones, auto-generated) should return a…

awarzynskiAuthorUnsubmitted

Done

No, it returns an abstract Type that you then have to cast to Vector. At least that's what I'm seeing 🤔 .

awarzynski: No, it returns an abstract `Type` that you then have to cast to Vector. At least that's what…

let extraClassDeclaration = [{

VectorType getVectorType() {

return ::llvm::cast<VectorType>(getRes().getType());

}

}];

dcaballeUnsubmitted

Not Done

This one should at least have memory side effects

dcaballe: This one should at least have memory side effects

awarzynskiAuthorUnsubmitted

Done

See [MemWrite] on L238:

let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,
                 Variadic<Index>:$indices,
                 nxnxv16i8:$valueToStore);

awarzynski: See `[MemWrite]` on L238: ``` let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>…

let assemblyFormat = "attr-dict `:` type($res)";

}

def TileStoreOp : ArmSME_Op<"tile_store"> {

let summary = "Tile store operation";

let description = [{

Store a 2D SME "virtual tile" to memory.

NOTE: At the moment it is assumed that the element type is `i8` and that

there's only one "virtual tile".

dcaballeUnsubmitted

Not Done

Would it make sense to align the operand order with the rest of store ops in MLIR? I.e., value-to-store, dst-memref [indices] : vector-type, memref-type?

dcaballe: Would it make sense to align the operand order with the rest of store ops in MLIR? I.e., `value…

awarzynskiAuthorUnsubmitted

Done

Good shout! I will align this with Vector_StoreOp.

awarzynski: Good shout! I will align this with `Vector_StoreOp`.

Example:

c-rhodesUnsubmitted

Not Done

```mlir

- arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>

+ arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>

```

}];

let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,

nit: indentation

c-rhodes: nit: indentation

c-rhodesUnsubmitted

Not Done

arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>

```

- }];

+ }];

let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,

nit: indentation

c-rhodes: nit: indentation

```mlir

c-rhodesUnsubmitted

Not Done

```mlir

- arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>

+ arm_sme.tile_store %0, %arg0[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>

```

}];

let arguments = (ins nxnxv16i8:$valueToStore,

example needs updating now the operand order has changed

c-rhodes: example needs updating now the operand order has changed

awarzynskiAuthorUnsubmitted

Done

Argh, assembly format needs updating too. Please double check. I am trying to align with VectorStoreOp, but this looks off: assembly format for VectorStoreOp.

awarzynski: Argh, assembly format needs updating too. Please double check. I am trying to align with…

arm_sme.tile_store %0, %arg0[%c0, %c0] : vector<[16]x[16]xi8>, memref<?x?xi8>

```

}];

c-rhodesUnsubmitted

Done

Variadic<Index>:$indices,

- VectorOfRankAndType<[2], [I8]>:$valueToStore);

+ nxnxv16i8:$valueToStore);

let extraClassDeclaration = [{

c-rhodes:

let arguments = (ins nxnxv16i8:$valueToStore,

Arg<AnyMemRef, "the reference to store to", [MemWrite]>:$base,

c-rhodesUnsubmitted

Done

nit: move to above line or indent to make it clear it applies to the memref

c-rhodes: nit: move to above line or indent to make it clear it applies to the memref

Variadic<Index>:$indices);

let extraClassDeclaration = [{

MemRefType getMemRefType() {

return ::llvm::cast<MemRefType>(getBase().getType());

}

c-rhodesUnsubmitted

Done

return ::llvm::cast<MemRefType>(getBase().getType());

}

- VectorType getVectorType() {

- return ::llvm::cast<VectorType>(getValueToStore().getType());

- }

+ VectorType getVectorType() {

+ return ::llvm::cast<VectorType>(getValueToStore().getType());

+ }

}];

let assemblyFormat = "$base `[` $indices `]` `,` $valueToStore attr-dict `:` "

nit: indentation

c-rhodes: nit: indentation

c-rhodesUnsubmitted

Done

arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8>

```

}];

- let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,

+ let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,

Variadic<Index>:$indices,

nxnxv16i8:$valueToStore);

- let extraClassDeclaration = [{

+ let extraClassDeclaration = [{

MemRefType getMemRefType() {

return ::llvm::cast<MemRefType>(getBase().getType());

}

VectorType getVectorType() {

return ::llvm::cast<VectorType>(getValueToStore().getType());

}

}];

let assemblyFormat = "$base `[` $indices `]` `,` $valueToStore attr-dict `:` "

nit: indentation

c-rhodes: nit: indentation

VectorType getVectorType() {

return ::llvm::cast<VectorType>(getValueToStore().getType());

}

}];

let assemblyFormat = "$valueToStore `,` $base `[` $indices `]` attr-dict "

"`:` type($base) `,` type($valueToStore)";

}

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// ArmSME Intrinsic op definitions // ArmSME Intrinsic op definitions

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

def MOPPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2], [I1]>; def MOPPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2], [I1]>;

def MOPVector : ScalableVectorOfLengthAndType<[16, 8, 4, 2], def MOPVector : ScalableVectorOfLengthAndType<[16, 8, 4, 2],

[I8, I16, BF16, F16, F32, F64]>; [I8, I16, BF16, F16, F32, F64]>;

def LDSTPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2, 1], [I1]>; def LDSTPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2, 1], [I1]>;

▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

mlir/lib/Conversion/CMakeLists.txt

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	add_subdirectory(ShapeToStandard)			add_subdirectory(ShapeToStandard)
	add_subdirectory(SPIRVToLLVM)			add_subdirectory(SPIRVToLLVM)
	add_subdirectory(TensorToLinalg)			add_subdirectory(TensorToLinalg)
	add_subdirectory(TensorToSPIRV)			add_subdirectory(TensorToSPIRV)
	add_subdirectory(TosaToArith)			add_subdirectory(TosaToArith)
	add_subdirectory(TosaToLinalg)			add_subdirectory(TosaToLinalg)
	add_subdirectory(TosaToSCF)			add_subdirectory(TosaToSCF)
	add_subdirectory(TosaToTensor)			add_subdirectory(TosaToTensor)
				add_subdirectory(VectorToArmSME)
	add_subdirectory(VectorToLLVM)			add_subdirectory(VectorToLLVM)
	add_subdirectory(VectorToGPU)			add_subdirectory(VectorToGPU)
	add_subdirectory(VectorToSCF)			add_subdirectory(VectorToSCF)
	add_subdirectory(VectorToSPIRV)			add_subdirectory(VectorToSPIRV)
				dcaballeUnsubmitted Done Reply Inline Actions sort dcaballe: sort

mlir/lib/Conversion/VectorToArmSME/CMakeLists.txt

This file was added.

				add_mlir_conversion_library(MLIRVectorToArmSME
				VectorToArmSME.cpp
				VectorToArmSMEPass.cpp

				ADDITIONAL_HEADER_DIRS
				${MLIR_MAIN_INCLUDE_DIR}/mlir/Conversion/VectorToArmSME

				DEPENDS
				MLIRConversionPassIncGen

				LINK_LIBS PUBLIC
				MLIRArmSMEDialect
				MLIRLLVMCommonConversion
				)
				c-rhodesUnsubmitted Done Reply Inline Actions I think this can be removed? c-rhodes: I think this can be removed?

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp

This file was added.

//===- VectorToArmSME.cpp - Conversion from Vector to the ArmSME dialect --===//

c-rhodesUnsubmitted

Done

- //===- VectorToSME.cpp - Conversion from Vector to the SME dialect --------===//

+ //===- VectorToArmSME.cpp - Conversion from Vector to the SME dialect --------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

filename needs updating here

c-rhodes: filename needs updating here

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

#include "mlir/Conversion/VectorToArmSME/VectorToArmSME.h"

#include "mlir/Dialect/ArmSME/IR/ArmSME.h"

c-rhodesUnsubmitted

Not Done

#include "mlir/Conversion/VectorToArmSME/VectorToArmSME.h"

#include "mlir/Dialect/ArmSME/IR/ArmSME.h"

nit: empty line

c-rhodes: nit: empty line

awarzynskiAuthorUnsubmitted

Done

AFAIK, there are no code style rules for this sort of things apart from:

The Main Module Header file applies to .cpp files which implement an interface defined by a .h file. This #include should always be included first regardless of where it lives on the file system.

And keeping an empty line between the main module include and other header files is quite common in MLIR:

https://github.com/llvm/llvm-project/blob/60c9d2993bbf1594e89e1e6f72e1472eb1aeb8ef/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp#L13-L14

awarzynski: AFAIK, there are no code style rules for this sort of things apart from: > The Main Module…

c-rhodesUnsubmitted

Not Done

AFAIK, there are no code style rules for this sort of things apart from:

The Main Module Header file applies to .cpp files which implement an interface defined by a .h file. This #include should always be included first regardless of where it lives on the file system.

And keeping an empty line between the main module include and other header files is quite common in MLIR:

https://github.com/llvm/llvm-project/blob/60c9d2993bbf1594e89e1e6f72e1472eb1aeb8ef/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp#L13-L14

Hadn't noticed that, thanks for pointing that out

c-rhodes: > AFAIK, there are no code style rules for this sort of things apart from: > > > The Main…

#include "mlir/IR/BuiltinTypes.h"

#include "llvm/Support/Casting.h"

using namespace mlir;

c-rhodesUnsubmitted

Done

unused?

c-rhodes: unused?

static constexpr unsigned kMinNumElts = 16;

/// Returns true if 'val' is a splat of zero, false otherwise.

static bool isSplatZero(Type elemType, DenseElementsAttr val) {

if (llvm::isa<FloatType>(elemType))

return val && val.isSplat() && val.getSplatValue<APFloat>().isZero();

if (llvm::isa<IntegerType>(elemType))

return val && val.isSplat() && val.getSplatValue<APInt>().isZero();

return false;

}

namespace {

/// Look at `vector.transfer_write` operations and convert suitable candidates

/// to ArmSME operations, e.g.:

///

/// %cst = arith.constant dense<0> : vector<[16]x[16]xi8>

/// vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8>

///

/// is converted to:

///

/// %0 = arm_sme.zero : vector<[16]x[16]xi8>

dcaballeUnsubmitted

Done

missing operand and types?

dcaballe: missing operand and types?

/// arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>,

dcaballeUnsubmitted

Done

nit: TransferWriteToArmSMELowering?

dcaballe: nit: TransferWriteToArmSMELowering?

c-rhodesUnsubmitted

Not Done

I noticed other rewrites are in an empty namespace, do we need one here?

c-rhodes: I noticed other rewrites are in an empty namespace, do we need one here?

awarzynskiAuthorUnsubmitted

Done

Done :)

awarzynski: Done :)

/// vector<[16]x[16]xi8>

///

struct TransferWriteToArmSMELowering

: public OpRewritePattern<vector::TransferWriteOp> {

using OpRewritePattern<vector::TransferWriteOp>::OpRewritePattern;

LogicalResult matchAndRewrite(vector::TransferWriteOp writeOp,

PatternRewriter &rewriter) const final {

auto vType = writeOp.getVectorType();

if (vType.getRank() != 2)

return failure();

if (vType.getShape() != ArrayRef<int64_t>({kMinNumElts, kMinNumElts}))

return failure();

if (vType.getElementType() != rewriter.getI8Type())

return failure();

if (vType.getScalableDims().size() != 2)

return failure();

auto loc = writeOp.getLoc();

dcaballeUnsubmitted

Done

if memRefType is not used beyond the condition you should use isa instead of dyn_cast

dcaballe: if `memRefType` is not used beyond the condition you should use `isa` instead of `dyn_cast`

if (!llvm::isa<MemRefType>(writeOp.getSource().getType()))

return failure();

auto constant = writeOp.getVector().getDefiningOp<arith::ConstantOp>();

if (!constant)

return failure();

auto denseAttr = dyn_cast<DenseElementsAttr>(constant.getValueAttr());

if (!denseAttr || !isSplatZero(vType.getElementType(), denseAttr))

return failure();

auto zero = rewriter.create<arm_sme::ZeroOp>(loc, vType);

rewriter.replaceOpWithNewOp<arm_sme::TileStoreOp>(

writeOp, zero, writeOp.getSource(), writeOp.getIndices());

return success();

}

};

} // namespace

void mlir::populateVectorToArmSMEPatterns(RewritePatternSet &patterns,

MLIRContext &ctx) {

patterns.add<TransferWriteToArmSMELowering>(&ctx);

}

mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp

This file was added.

//===- VectorToArmSMEPass.cpp - Conversion from Vector to the ArmSME dialect =//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

#include "mlir/Conversion/VectorToArmSME/VectorToArmSME.h"

#include "mlir/Dialect/ArmSME/IR/ArmSME.h"

c-rhodesUnsubmitted

Not Done

#include "mlir/Conversion/VectorToArmSME/VectorToArmSME.h"

#include "mlir/Dialect/ArmSME/IR/ArmSME.h"

nit: empty line

c-rhodes: nit: empty line

#include "mlir/Pass/Pass.h"

#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

namespace mlir {

#define GEN_PASS_DEF_CONVERTVECTORTOARMSME

#include "mlir/Conversion/Passes.h.inc"

} // namespace mlir

using namespace mlir;

using namespace mlir::vector;

namespace {

struct ConvertVectorToArmSMEPass

: public impl::ConvertVectorToArmSMEBase<ConvertVectorToArmSMEPass> {

void runOnOperation() override;

};

} // namespace

dcaballeUnsubmitted

Not Done

We should move the dependencies to the .td file. There is a way to have them defined there and have the code autogenerated.

dcaballe: We should move the dependencies to the .td file. There is a way to have them defined there and…

awarzynskiAuthorUnsubmitted

Done

Annoyingly, that's already there :) Good catch, thanks!

awarzynski: Annoyingly, that's already there :) Good catch, thanks!

void ConvertVectorToArmSMEPass::runOnOperation() {

RewritePatternSet patterns(&getContext());

populateVectorToArmSMEPatterns(patterns, getContext());

(void)applyPatternsAndFoldGreedily(getOperation(), std::move(patterns));

}

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	populateVectorToLLVMConversionPatterns(
converter, patterns, reassociateFPReductions, force32BitVectorIndices);		converter, patterns, reassociateFPReductions, force32BitVectorIndices);
populateVectorToLLVMMatrixConversionPatterns(converter, patterns);		populateVectorToLLVMMatrixConversionPatterns(converter, patterns);

// Architecture specific augmentations.		// Architecture specific augmentations.
LLVMConversionTarget target(getContext());		LLVMConversionTarget target(getContext());
target.addLegalDialect<arith::ArithDialect>();		target.addLegalDialect<arith::ArithDialect>();
target.addLegalDialect<memref::MemRefDialect>();		target.addLegalDialect<memref::MemRefDialect>();
target.addLegalOp<UnrealizedConversionCastOp>();		target.addLegalOp<UnrealizedConversionCastOp>();
if (armNeon) {		if (armNeon) {
		c-rhodesUnsubmitted Done Reply Inline Actions nit: unrelated change c-rhodes: nit: unrelated change
// TODO: we may or may not want to include in-dialect lowering to		// TODO: we may or may not want to include in-dialect lowering to
// LLVM-compatible operations here. So far, all operations in the dialect		// LLVM-compatible operations here. So far, all operations in the dialect
// can be translated to LLVM IR so there is no conversion necessary.		// can be translated to LLVM IR so there is no conversion necessary.
target.addLegalDialect<arm_neon::ArmNeonDialect>();		target.addLegalDialect<arm_neon::ArmNeonDialect>();
}		}
if (armSVE) {		if (armSVE) {
configureArmSVELegalizeForExportTarget(target);		configureArmSVELegalizeForExportTarget(target);
populateArmSVELegalizeForLLVMExportPatterns(converter, patterns);		populateArmSVELegalizeForLLVMExportPatterns(converter, patterns);
}		}
if (armSME) {		if (armSME) {
configureArmSMELegalizeForExportTarget(target);		configureArmSMELegalizeForExportTarget(target);
populateArmSMELegalizeForLLVMExportPatterns(converter, patterns);		populateArmSMELegalizeForLLVMExportPatterns(converter, patterns);
arm_sme::populateVectorTransferLoweringPatterns(converter, patterns);
}		}
if (amx) {		if (amx) {
configureAMXLegalizeForExportTarget(target);		configureAMXLegalizeForExportTarget(target);
populateAMXLegalizeForLLVMExportPatterns(converter, patterns);		populateAMXLegalizeForLLVMExportPatterns(converter, patterns);
}		}
if (x86Vector) {		if (x86Vector) {
configureX86VectorLegalizeForExportTarget(target);		configureX86VectorLegalizeForExportTarget(target);
populateX86VectorLegalizeForLLVMExportPatterns(converter, patterns);		populateX86VectorLegalizeForLLVMExportPatterns(converter, patterns);
}		}

if (failed(		if (failed(
applyPartialConversion(getOperation(), target, std::move(patterns))))		applyPartialConversion(getOperation(), target, std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}

mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt

	add_mlir_dialect_library(MLIRArmSMEDialect			add_mlir_dialect_library(MLIRArmSMEDialect
	ArmSME.cpp			ArmSME.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME

	DEPENDS			DEPENDS
	MLIRArmSMEIncGen			MLIRArmSMEIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRIR			MLIRIR
	MLIRLLVMDialect			MLIRLLVMDialect
	MLIRSCFDialect			MLIRSCFDialect
	MLIRSideEffectInterfaces			MLIRSideEffectInterfaces
				dcaballeUnsubmitted Done Reply Inline Actions sort dcaballe: sort
				MLIRVectorDialect
	)			)

mlir/lib/Dialect/ArmSME/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRArmSMETransforms			add_mlir_dialect_library(MLIRArmSMETransforms
	EnableArmStreaming.cpp			EnableArmStreaming.cpp
	LegalizeForLLVMExport.cpp			LegalizeForLLVMExport.cpp
	LowerVectorOps.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME/Transforms			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME/Transforms

	DEPENDS			DEPENDS
	MLIRArmSMETransformsIncGen			MLIRArmSMETransformsIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRArmSMEDialect			MLIRArmSMEDialect
	MLIRFuncDialect			MLIRFuncDialect
	MLIRLLVMCommonConversion			MLIRLLVMCommonConversion
	MLIRVectorDialect			MLIRVectorDialect
	MLIRSCFDialect			MLIRSCFDialect
	MLIRPass			MLIRPass
	)			)

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp

//===- LegalizeForLLVMExport.cpp - Prepare ArmSME for LLVM translation ----===// //===- LegalizeForLLVMExport.cpp - Prepare ArmSME for LLVM translation ----===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "mlir/Conversion/LLVMCommon/ConversionTarget.h" #include "mlir/Conversion/LLVMCommon/ConversionTarget.h"

#include "mlir/Conversion/LLVMCommon/Pattern.h" #include "mlir/Conversion/LLVMCommon/Pattern.h"

#include "mlir/Dialect/Arith/IR/Arith.h"

#include "mlir/Dialect/ArmSME/IR/ArmSME.h" #include "mlir/Dialect/ArmSME/IR/ArmSME.h"

#include "mlir/Dialect/ArmSME/Transforms/Transforms.h" #include "mlir/Dialect/ArmSME/Transforms/Transforms.h"

#include "mlir/Dialect/Func/IR/FuncOps.h" #include "mlir/Dialect/Func/IR/FuncOps.h"

#include "mlir/Dialect/LLVMIR/LLVMDialect.h" #include "mlir/Dialect/LLVMIR/LLVMDialect.h"

#include "mlir/Dialect/SCF/IR/SCF.h" #include "mlir/Dialect/SCF/IR/SCF.h"

#include "mlir/Dialect/Vector/IR/VectorOps.h"

using namespace mlir; using namespace mlir;

using namespace mlir::arm_sme; using namespace mlir::arm_sme;

static constexpr unsigned kMinNumElts = 16;

static constexpr unsigned kZeroZAMask = 255;

namespace { namespace {

/// Insert 'llvm.aarch64.sme.za.enable' intrinsic at the start of 'func.func' /// Insert 'llvm.aarch64.sme.za.enable' intrinsic at the start of 'func.func'

/// ops to enable the ZA storage array. /// ops to enable the ZA storage array.

struct EnableZAPattern : public OpRewritePattern<func::FuncOp> { struct EnableZAPattern : public OpRewritePattern<func::FuncOp> {

using OpRewritePattern::OpRewritePattern; using OpRewritePattern::OpRewritePattern;

LogicalResult matchAndRewrite(func::FuncOp op, LogicalResult matchAndRewrite(func::FuncOp op,

PatternRewriter &rewriter) const final { PatternRewriter &rewriter) const final {

OpBuilder::InsertionGuard g(rewriter); OpBuilder::InsertionGuard g(rewriter);

Show All 25 Lines matchAndRewrite(GetTileID op, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override { ConversionPatternRewriter &rewriter) const override {

// TODO: implement tile allocation, currently only tile 0 is supported. // TODO: implement tile allocation, currently only tile 0 is supported.

rewriter.replaceOpWithNewOp<LLVM::ConstantOp>(op, rewriter.getI32Type(), 0); rewriter.replaceOpWithNewOp<LLVM::ConstantOp>(op, rewriter.getI32Type(), 0);

return success(); return success();

} }

}; };

} // namespace } // namespace

void mlir::populateArmSMELegalizeForLLVMExportPatterns( /// Lower 'arm_sme.zero'. Use 'arm_sme.cast_tile_to_vector' to model the return

LLVMTypeConverter &converter, RewritePatternSet &patterns) { /// value. The latter is a nop, which should be folded away (e.g. during

patterns.add<EnableZAPattern, DisableZAPattern>(patterns.getContext()); /// canonicalisation).

///

/// BEFORE:

/// ```mlir

/// %0 = arm_sme.zero : vector<[16]x[16]xi8>

/// ```

///

/// AFTER:

/// ```mlir

/// %1 = arm_sme.get_tile_id : i8

/// %2 = arm_sme.cast_tile_to_vector %1 : i8 to vector<[16]x[16]xi8>

/// "arm_sme.intr.zero"(%c255_i32) : (i32) -> ()

/// ```

struct ZeroOpConversion : public ConvertOpToLLVMPattern<ZeroOp> {

using ConvertOpToLLVMPattern<ZeroOp>::ConvertOpToLLVMPattern;

LogicalResult

matchAndRewrite(ZeroOp zero, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override {

auto loc = zero.getLoc();

// Get Tile ID for the `zero` intrinsic.

// TODO: Map this to a valid `mask` for the `zero` intrinsic.

auto tileId = rewriter.create<arm_sme::GetTileID>(

loc, zero.getVectorType().getElementType());

// Create 'arm_sme.intr.zero' intrinsic to zero ZA.

// FIXME: Replace the hard-coded mask with a valid value based

// on `tileId`.

auto mask = rewriter.create<arith::ConstantOp>(

loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(kZeroZAMask));

rewriter.create<arm_sme::aarch64_sme_zero>(loc, mask);

// Create `CastTileToVectorOp` to use it as the output

rewriter.replaceOpWithNewOp<arm_sme::CastTileToVector>(zero, zero.getType(),

tileId);

return success();

} }

};

/// Lower 'arm_sme.store_tile' to a loop over the rows of ZA and store each row

/// using 'arm_sme.intr.str'.

///

/// BEFORE:

/// ```mlir

/// arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>,

/// vector<[16]x[16]xi8

/// ```

///

/// AFTER:

/// ```mlir

/// %vscale = "llvm.intr.vscale"() : () -> index

/// %c0 = arith.constant 0 : index

/// %c1 = arith.constant 1 : index

/// %c16 = arith.constant 16 : index

/// %vec_size = arith.muli %c16, %vscale : index

/// scf.for %row_idx = %c0 to %vec_size step %c1 {

/// // (...)

/// "arm_sme.intr.str"(%row_idx, %addr) : (i32, !llvm.ptr) -> ()

/// ```

struct TileStoreOpConversion : public ConvertOpToLLVMPattern<TileStoreOp> {

using ConvertOpToLLVMPattern<TileStoreOp>::ConvertOpToLLVMPattern;

LogicalResult

matchAndRewrite(TileStoreOp store, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override {

auto loc = store.getLoc();

// Create loop that iterates from 0 to SVLB-1 inclusive (the number of

// vectors in ZA) and stores each ZA vector to memory.

auto step = rewriter.create<arith::ConstantIndexOp>(loc, 1);

auto minElems = rewriter.create<arith::ConstantIndexOp>(loc, kMinNumElts);

auto vscale =

rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());

auto lowerBound = rewriter.create<arith::ConstantIndexOp>(loc, 0);

auto upperBound = rewriter.create<arith::MulIOp>(loc, minElems, vscale);

auto forOp = rewriter.create<scf::ForOp>(loc, lowerBound, upperBound, step);

rewriter.setInsertionPointToStart(forOp.getBody());

// Create 'arm_sme.intr.str' intrinsic to store ZA vector.

auto vnumI64 = rewriter.create<arith::IndexCastUIOp>(

loc, rewriter.getI64Type(), forOp.getInductionVar());

auto offset =

rewriter.create<LLVM::ConstantOp>(loc, rewriter.getI64Type(), 0);

Value ptr =

getStridedElementPtr(loc, store.getMemRefType(), adaptor.getBase(),

ValueRange{vnumI64, offset}, rewriter);

auto vnumI32 = rewriter.create<arith::IndexCastUIOp>(

loc, rewriter.getI32Type(), forOp.getInductionVar());

rewriter.create<arm_sme::aarch64_sme_str>(loc, vnumI32, ptr);

rewriter.eraseOp(store);

return success();

}

};

void mlir::configureArmSMELegalizeForExportTarget( void mlir::configureArmSMELegalizeForExportTarget(

LLVMConversionTarget &target) { LLVMConversionTarget &target) {

target.addLegalOp<scf::ForOp, scf::YieldOp, arm_sme::CastTileToVector, target.addLegalOp<scf::ForOp, scf::YieldOp, arm_sme::CastTileToVector,

arm_sme::CastVectorToTile, arm_sme::aarch64_sme_zero, arm_sme::CastVectorToTile, arm_sme::aarch64_sme_zero,

arm_sme::aarch64_sme_str, arm_sme::aarch64_sme_za_enable, arm_sme::aarch64_sme_str, arm_sme::aarch64_sme_za_enable,

arm_sme::aarch64_sme_za_disable>(); arm_sme::aarch64_sme_za_disable>();

target.addLegalOp<GetTileID>(); target.addLegalOp<GetTileID>();

// Mark 'func.func' ops as legal if either: // Mark 'func.func' ops as legal if either:

// 1. no 'arm_za' function attribute is present. // 1. no 'arm_za' function attribute is present.

// 2. the 'arm_za' function attribute is present and the first op in the // 2. the 'arm_za' function attribute is present and the first op in the

// function is an 'arm_sme::aarch64_sme_za_enable' intrinsic. // function is an 'arm_sme::aarch64_sme_za_enable' intrinsic.

target.addDynamicallyLegalOp<func::FuncOp>([&](func::FuncOp funcOp) { target.addDynamicallyLegalOp<func::FuncOp>([&](func::FuncOp funcOp) {

if (funcOp.isDeclaration()) if (funcOp.isDeclaration())

return true; return true;

auto firstOp = funcOp.getBody().front().begin(); auto firstOp = funcOp.getBody().front().begin();

return !funcOp->hasAttr("arm_za") || return !funcOp->hasAttr("arm_za") ||

isa<arm_sme::aarch64_sme_za_enable>(firstOp); isa<arm_sme::aarch64_sme_za_enable>(firstOp);

}); });

// Mark 'func.return' ops as legal if either: // Mark 'func.return' ops as legal if either:

// 1. no 'arm_za' function attribute is present. // 1. no 'arm_za' function attribute is present.

// 2. the 'arm_za' function attribute is present and there's a preceding // 2. the 'arm_za' function attribute is present and there's a preceding

// 'arm_sme::aarch64_sme_za_disable' intrinsic. // 'arm_sme::aarch64_sme_za_disable' intrinsic.

target.addDynamicallyLegalOp<func::ReturnOp>([&](func::ReturnOp returnOp) { target.addDynamicallyLegalOp<func::ReturnOp>([&](func::ReturnOp returnOp) {

bool hasDisableZA = false; bool hasDisableZA = false;

auto funcOp = returnOp->getParentOp(); auto funcOp = returnOp->getParentOp();

funcOp->walk<WalkOrder::PreOrder>( funcOp->walk<WalkOrder::PreOrder>(

[&](arm_sme::aarch64_sme_za_disable op) { hasDisableZA = true; }); [&](arm_sme::aarch64_sme_za_disable op) { hasDisableZA = true; });

return !funcOp->hasAttr("arm_za") || hasDisableZA; return !funcOp->hasAttr("arm_za") || hasDisableZA;

}); });

} }

void mlir::populateArmSMELegalizeForLLVMExportPatterns(

LLVMTypeConverter &converter, RewritePatternSet &patterns) {

patterns.add<EnableZAPattern, DisableZAPattern>(patterns.getContext());

c-rhodesUnsubmitted

Not Done

/// value. The latter is a nop, which should be folded away (e.g. during

- /// canonicalisation).

+ /// canonicalization).

///

/// BEFORE:

nit: american spelling

c-rhodes: nit: american spelling

patterns.add<TileStoreOpConversion, ZeroOpConversion>(converter);

}

c-rhodesUnsubmitted

Not Done

/// ```mlir

- /// %0 = arm_sme.zero : vector<[16]x[16]xi8>

+ /// %0 = arm_sme.zero : vector<[16]x[16]xi8>

/// ```

nit: indentation

c-rhodes: nit: indentation

c-rhodesUnsubmitted

Not Done

/// ```mlir

- /// %2 = arm_sme.cast_tile_to_vector %1 : i8 to vector<[16]x[16]xi8>

- /// "arm_sme.intr.zero"(%c255_i32) : (i32) -> ()

+ /// %2 = arm_sme.cast_tile_to_vector %1 : i8 to vector<[16]x[16]xi8>

+ /// "arm_sme.intr.zero"(%c255_i32) : (i32) -> ()

/// ```

nit: indentation

c-rhodes: nit: indentation

c-rhodesUnsubmitted

Not Done

loc, zero.getResult().getType().getElementType());

- auto castTileToVec = rewriter.create<arm_sme::CastTileToVector>(

- loc, zero.getResult().getType(), tileId);

// Create 'arm_sme.intr.zero' intrinsic to zero ZA.

auto tile = rewriter.create<arith::ConstantOp>(

loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(kZeroZAMask));

rewriter.create<arm_sme::aarch64_sme_zero>(loc, tile);

- zero.replaceAllUsesWith(castTileToVec.getResult());

- rewriter.eraseOp(zero);

+ rewriter.replaceOpWithNewOp:arm_sme::CastTileToVector>(

+ zero, zero.getVectorType(), tileId);

return success();

the cast op should create created after the intrinsic since it represents the tile loaded by the preceding intrinsic

c-rhodes: the cast op should create created after the intrinsic since it represents the tile loaded by…

c-rhodesUnsubmitted

Not Done

Please could you look at this again, the cast is still created before the intrinsic.

c-rhodes: Please could you look at this again, the cast is still created before the intrinsic.

dcaballeUnsubmitted

Not Done

Yes, you should be able to do rewriter.replaceOpWithNewOp(zero, ....

dcaballe: Yes, you should be able to do `rewriter.replaceOpWithNewOp(zero, ...`.

awarzynskiAuthorUnsubmitted

Done

Apologies @c-rhodes , I missed this comment. Will be updating shortly.

the cast op should create created after the intrinsic since it represents the tile loaded by the preceding intrinsic

Do you think that the order will matter in practice? Otherwise somebody could just rewrite your suggestion as:

the cast op should create created after the intrinsic since it represents the tile loaded by the _following_ intrinsic

IIUC, the order does not matter, but might be missing something? Regardless, we should definitely make sure that we are consistent and I am happy with "after" (i.e. your suggestion).

awarzynski: Apologies @c-rhodes , I missed this comment. Will be updating shortly. > the cast op should…

c-rhodesUnsubmitted

Not Done

/// ```mlir

- /// arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>,

- /// vector<[16]x[16]xi8

+ /// arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>,

+ /// vector<[16]x[16]xi8

/// ```

nit: indentation

c-rhodes: nit: indentation

c-rhodesUnsubmitted

Not Done

nit: indentation

c-rhodes: nit: indentation

c-rhodesUnsubmitted

Done

ConversionPatternRewriter &rewriter) const override {

- auto memRefType = llvm::dyn_cast<MemRefType>(store.getMemRefType());

- if (!memRefType)

- return failure();

+ auto memRefType = store.getMemRefType();

auto loc = store.getLoc();

the cast can be removed

c-rhodes: the cast can be removed

c-rhodesUnsubmitted

Not Done

auto castTileToVec = rewriter.create<arm_sme::CastTileToVector>(

- loc, zero.getResult().getType(), tileId);

+ loc, zero.getVectorType(), tileId);

// Create 'arm_sme.intr.zero' intrinsic to zero ZA.

getVectorType?

c-rhodes: getVectorType?

dcaballeUnsubmitted

Done

add getTileId? Otherwise, it's not clear where %1 is coming from

dcaballe: add getTileId? Otherwise, it's not clear where %1 is coming from

dcaballeUnsubmitted

Not Done

Ok, I see what you are trying to do here... and can't think of a better way. This is more like propagating information (getTileId) across different op converters but through the IR. I think I tried to do something similar by introducing a state in the converters but I barely remember. I'm ok with this.

dcaballe: Ok, I see what you are trying to do here... and can't think of a better way. This is more like…

dcaballeUnsubmitted

Not Done

Something important here: we introduce the SME lowering layer to explicitly model what is needed for SME and make the conversion to LLVM easier. However, here we are materializing a loop. I'm wondering why that loop is not generated when we move from Vector to the SME dialect and then the conversion to LLVM is mostly a 1:1 translation to the intrinsics.

dcaballe: Something important here: we introduce the SME lowering layer to explicitly model what is…

c-rhodesUnsubmitted

Not Done

Something important here: we introduce the SME lowering layer to explicitly model what is needed for SME and make the conversion to LLVM easier. However, here we are materializing a loop. I'm wondering why that loop is not generated when we move from Vector to the SME dialect and then the conversion to LLVM is mostly a 1:1 translation to the intrinsics.

I've also been thinking about this, the load/stores in SME operate on ZA array vectors or tile slices, which are 1-d scalable vectors of SVL bits, rather than an entire tile, hence the loop materialization. Perhaps if we had custom ops that deal with tile vectors the loop could be emitted when going from Vector -> SME and these would later map 1-1 with LLVM intrinsics. We'll consider what we can do here, thanks for raising this.

c-rhodes: > Something important here: we introduce the SME lowering layer to explicitly model what is…

awarzynskiAuthorUnsubmitted

Done

Good points, thanks! Now that you have raised this I see that this abstraction should be re-fined.

Is it OK to iterate in future patches though? There's a few other patches that depend on one another, so I would land this as is and refactor separately. My main goal is to get the overall scaffolding in first (i.e. the "Vector to SME" pass). WDYT?

awarzynski: Good points, thanks! Now that you have raised this I see that this abstraction should be re…

c-rhodesUnsubmitted

Not Done

Good points, thanks! Now that you have raised this I see that this abstraction should be re-fined.

Is it OK to iterate in future patches though? There's a few other patches that depend on one another, so I would land this as is and refactor separately. My main goal is to get the overall scaffolding in first (i.e. the "Vector to SME" pass). WDYT?

Yeah that can be done separately.

c-rhodes: > Good points, thanks! Now that you have raised this I see that this abstraction should be re…

dcaballeUnsubmitted

Not Done

It sounds good to me to do this separately but this is a big abstraction change so hopefully we can do it sooner than later. If you think the non-loop abstraction is also useful, we could also have two level of abtractions within the same dialect, where we go first to the non-loop one and then materialize the loop at some point within the SME dialect. The Vector dialect is a good example of this.

dcaballe: It sounds good to me to do this separately but this is a big abstraction change so hopefully we…

c-rhodesUnsubmitted

Not Done

It sounds good to me to do this separately but this is a big abstraction change so hopefully we can do it sooner than later. If you think the non-loop abstraction is also useful, we could also have two level of abtractions within the same dialect, where we go first to the non-loop one and then materialize the loop at some point within the SME dialect. The Vector dialect is a good example of this.

I've shared an update on Discourse: https://discourse.llvm.org/t/loop-materialization-in-armsme/72354

And a solution in D156467

c-rhodes: > It sounds good to me to do this separately but this is a big abstraction change so hopefully…

c-rhodesUnsubmitted

Not Done

move this to bottom alongside populateArmSMELegalizeForLLVMExportPatterns?

c-rhodes: move this to bottom alongside `populateArmSMELegalizeForLLVMExportPatterns`?

c-rhodesUnsubmitted

Done

this should be created before the zero, and we should add a note that get_tile_id and zero aren't chain together yet

c-rhodes: this should be created before the zero, and we should add a note that get_tile_id and zero…

c-rhodesUnsubmitted

Not Done

nit: the variable names could be improved, %3 -> %vscale for example

c-rhodes: nit: the variable names could be improved, %3 -> %vscale for example

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp

This file was deleted.

	//===- LowerVectorOps.cpp - Lower vector ops to SME -----------------------===//
	//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//
	//===----------------------------------------------------------------------===//
	//
	// This file implements rewrite patterns to lower vector dialect ops to ArmSME.
	//
	//===----------------------------------------------------------------------===//

	#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
	#include "mlir/Conversion/LLVMCommon/Pattern.h"
	#include "mlir/Dialect/Arith/IR/Arith.h"
	#include "mlir/Dialect/ArmSME/IR/ArmSME.h"
	#include "mlir/Dialect/ArmSME/Transforms/Transforms.h"
	#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
	#include "mlir/Dialect/SCF/IR/SCF.h"
	#include "mlir/Dialect/Vector/IR/VectorOps.h"
	#include "mlir/IR/BuiltinOps.h"
	#include "mlir/IR/PatternMatch.h"

	using namespace mlir;
	using namespace mlir::arm_sme;

	static constexpr unsigned kMinNumElts = 16;
	static constexpr unsigned kZeroZAMask = 255;

	/// Returns true if 'val' is a splat of zero, false otherwise.
	static bool isSplatZero(Type elemType, DenseElementsAttr val) {
	if (llvm::isa<FloatType>(elemType))
	return val && val.isSplat() && val.getSplatValue<APFloat>().isZero();
	if (llvm::isa<IntegerType>(elemType))
	return val && val.isSplat() && val.getSplatValue<APInt>().isZero();
	return false;
	}

	namespace {
	/// Lower 'vector.transfer_write' op to 'arm_sme.intr.zero' op. Currently only
	/// supports 2d scalable vector type 'vector<[16x16]xi8>' that maps to the ZA0.B
	/// SME virtual tile. This will be extended to support more element types.
	struct TransferWriteToArmSMEZeroLowering
	: public ConvertOpToLLVMPattern<vector::TransferWriteOp> {
	using ConvertOpToLLVMPattern<vector::TransferWriteOp>::ConvertOpToLLVMPattern;

	LogicalResult
	matchAndRewrite(vector::TransferWriteOp write, OpAdaptor adaptor,
	ConversionPatternRewriter &rewriter) const override {
	auto vType = write.getVectorType();
	if (vType.getRank() != 2)
	return failure();
	if (vType.getShape() != ArrayRef<int64_t>({kMinNumElts, kMinNumElts}))
	return failure();
	if (vType.getElementType() != rewriter.getI8Type())
	return failure();
	if (vType.getScalableDims().size() != 2)
	return failure();

	auto memRefType = llvm::dyn_cast<MemRefType>(write.getSource().getType());
	if (!memRefType)
	return failure();

	auto constant = write.getVector().getDefiningOp<arith::ConstantOp>();
	if (!constant)
	return failure();

	auto denseAttr = dyn_cast<DenseElementsAttr>(constant.getValueAttr());
	if (!denseAttr \|\| !isSplatZero(vType.getElementType(), denseAttr))
	return failure();

	auto loc = write.getLoc();

	// Create 'arm_sme.intr.zero' intrinsic to zero ZA.
	auto tile = rewriter.create<arith::ConstantOp>(
	loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(kZeroZAMask));
	rewriter.create<arm_sme::aarch64_sme_zero>(loc, tile);

	// Create loop that iterates from 0 to SVLB-1 inclusive (the number of
	// vectors in ZA) and stores each ZA vector to memory.
	auto step = rewriter.create<arith::ConstantIndexOp>(loc, 1);
	auto minElems = rewriter.create<arith::ConstantIndexOp>(loc, kMinNumElts);
	auto vscale =
	rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());
	auto lowerBound = rewriter.create<arith::ConstantIndexOp>(loc, 0);
	auto upperBound = rewriter.create<arith::MulIOp>(loc, minElems, vscale);
	auto forOp = rewriter.create<scf::ForOp>(loc, lowerBound, upperBound, step);
	rewriter.setInsertionPointToStart(forOp.getBody());

	// Create 'arm_sme.intr.str' intrinsic to store ZA vector.
	auto vnumI64 = rewriter.create<arith::IndexCastUIOp>(
	loc, rewriter.getI64Type(), forOp.getInductionVar());
	auto offset =
	rewriter.create<LLVM::ConstantOp>(loc, rewriter.getI64Type(), 0);
	Value ptr = getStridedElementPtr(loc, memRefType, adaptor.getSource(),
	ValueRange{vnumI64, offset}, rewriter);
	auto vnumI32 = rewriter.create<arith::IndexCastUIOp>(
	loc, rewriter.getI32Type(), forOp.getInductionVar());
	rewriter.create<arm_sme::aarch64_sme_str>(loc, vnumI32, ptr);

	rewriter.eraseOp(write);

	return success();
	}
	};
	} // namespace

	void mlir::arm_sme::populateVectorTransferLoweringPatterns(
	LLVMTypeConverter &converter, RewritePatternSet &patterns) {
	patterns.add<TransferWriteToArmSMEZeroLowering>(converter);
	}

mlir/test/Dialect/ArmSME/roundtrip.mlir

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines

func.func @arm_sme_get_tile_id_i32() -> i32 {

// CHECK: arm_sme.get_tile_id : i32

%0 = arm_sme.get_tile_id : i32

return %0 : i32

}

// -----

func.func @arm_sme_get_tile_id_i64() -> i64 {

c-rhodesUnsubmitted

Not Done

// -----

- func.func @arm_sme_store_tile(%tile : vector<[16]x[16]xi8>, %dest: memref<?x?xi8>) -> () {

+ func.func @arm_sme_store_tile(%tile : vector<[16]x[16]xi8>, %dest : memref<?x?xi8>) -> () {

// CHECK: arm_sme.tile_store {{.*}} : vector<[16]x[16]xi8>, memref<?x?xi8>

nit: space before ":" for consistency

c-rhodes: nit: space before ":" for consistency

// CHECK: arm_sme.get_tile_id : i64

%0 = arm_sme.get_tile_id : i64

return %0 : i64

c-rhodesUnsubmitted

Not Done

hasn't the operand order been changed so this comes first? Surprised this test passed

c-rhodes: hasn't the operand order been changed so this comes first? Surprised this test passed

awarzynskiAuthorUnsubmitted

Done

I've not changed the assembly format yet ;-)

awarzynski: I've not changed the assembly format yet ;-)

}

// -----

func.func @arm_sme_get_tile_id_i128() -> i128 {

// CHECK: arm_sme.get_tile_id : i128

%0 = arm_sme.get_tile_id : i128

return %0 : i128

}

// -----

func.func @arm_sme_zero() -> () {

// CHECK: arm_sme.zero : vector<[16]x[16]xi8>

%0 = arm_sme.zero : vector<[16]x[16]xi8>

return

}

// -----

func.func @arm_sme_store_tile(%tile : vector<[16]x[16]xi8>, %dest : memref<?x?xi8>) -> () {

// CHECK: arm_sme.tile_store {{.*}} : memref<?x?xi8>, vector<[16]x[16]xi8>

%c0 = arith.constant 0 : index

arm_sme.tile_store %tile, %dest[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>

return

}

mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir

This file was added.

				// RUN: mlir-opt %s -convert-vector-to-arm-sme -convert-vector-to-llvm="enable-arm-sme" -split-input-file \| mlir-opt \| FileCheck %s

				// CHECK-LABEL: @transfer_write_2d_zero_i8
				// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)
				// CHECK-DAG: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
				// CHECK-DAG: %[[C255:.*]] = arith.constant 255 : i32
				// CHECK-DAG: "arm_sme.intr.zero"(%[[C255]]) : (i32) -> ()
				// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i8
				// CHECK-DAG: %[[CAST_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i8 to vector<[16]x[16]xi8>
				// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
				c-rhodesUnsubmitted Not Done Reply Inline Actions the order here is important, should we be using CHECK-DAG? c-rhodes: the order here is important, should we be using CHECK-DAG?
				awarzynskiAuthorUnsubmitted Done Reply Inline Actions These will always be ordered correctly as there is a dependency expressed via `TILE_ID`: // CHECK-DAG: %[[TILE_ID:.]] = arm_sme.get_tile_id : i8 // CHECK-DAG: %[[CAST_TO_VECTOR:.]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i8 to vector<[16]x[16]xi8> So I think that it should be OK. awarzynski: These will always be ordered correctly as there is a dependency expressed via `TILE_ID`: ``` //…
				// CHECK-DAG: %[[MIN_ZA_VECTORS:.*]] = arith.constant 16 : index
				// CHECK-NEXT: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64
				// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index
				// CHECK-NEXT: %[[C0_0:.*]] = arith.constant 0 : index
				// CHECK-NEXT: %[[NUM_ZA_VECTORS:.*]] = arith.muli %[[MIN_ZA_VECTORS]], %[[VSCALE_IDX]] : index
				// CHECK-NEXT: scf.for %[[VNUM:.*]] = %[[C0_0]] to %[[NUM_ZA_VECTORS]] step %[[C1]] {
				// CHECK-NEXT: %[[VNUM_I64:.*]] = arith.index_castui %[[VNUM]] : index to i64
				// CHECK-NEXT: %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
				// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
				// CHECK-NEXT: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
				// CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[VNUM_I64]], %[[STRIDE0]] : i64
				// CHECK-NEXT: %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C0_1]] : i64
				// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
				// CHECK-NEXT: %[[VNUM_I32:.*]] = arith.index_castui %[[VNUM]] : index to i32
				// CHECK-NEXT: "arm_sme.intr.str"(%[[VNUM_I32]], %[[GEP]]) : (i32, !llvm.ptr) -> ()
				func.func @transfer_write_2d_zero_i8(%arg0 : memref<?x?xi8>) {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
				vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
				return
				}

mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir

This file was moved from mlir/test/Dialect/ArmSME/vector-ops.mlir.

	// RUN: mlir-opt %s -convert-vector-to-llvm="enable-arm-sme" -split-input-file \| mlir-opt \| FileCheck %s			// RUN: mlir-opt %s -convert-vector-to-arm-sme -split-input-file \| mlir-opt \| FileCheck %s

	// CHECK-LABEL: @transfer_write_2d_zero_i8
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)			// CHECK-LABEL: func.func @transfer_write_2d_zero(
	// CHECK-NEXT: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>			// CHECK-SAME: %[[ARG_0:.*]]: memref<?x?xi8>) {
	// CHECK: %[[C255:.*]] = arith.constant 255 : i32			func.func @transfer_write_2d_zero(%arg0 : memref<?x?xi8>) {
	// CHECK-NEXT: "arm_sme.intr.zero"(%[[C255]]) : (i32) -> ()			// CHECK: %[[C_0:.*]] = arith.constant 0 : index
	// CHECK-NEXT: %[[C1:.*]] = arith.constant 1 : index			// CHECK: %[[ZERO:.*]] = arm_sme.zero : vector<[16]x[16]xi8>
	// CHECK-NEXT: %[[MIN_ZA_VECTORS:.*]] = arith.constant 16 : index			// CHECK: arm_sme.tile_store %[[ZERO]], %[[ARG_0]][%[[C_0]], %[[C_0]]] : memref<?x?xi8>, vector<[16]x[16]xi8>
	// CHECK-NEXT: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64			// CHECK: return
	// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index
	// CHECK-NEXT: %[[C0_0:.*]] = arith.constant 0 : index
	// CHECK-NEXT: %[[NUM_ZA_VECTORS:.*]] = arith.muli %[[MIN_ZA_VECTORS]], %[[VSCALE_IDX]] : index
	// CHECK-NEXT: scf.for %[[VNUM:.*]] = %[[C0_0]] to %[[NUM_ZA_VECTORS]] step %[[C1]] {
	// CHECK-NEXT: %[[VNUM_I64:.*]] = arith.index_castui %[[VNUM]] : index to i64
	// CHECK-NEXT: %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
	// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-NEXT: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[VNUM_I64]], %[[STRIDE0]] : i64
	// CHECK-NEXT: %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C0_1]] : i64
	// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
	// CHECK-NEXT: %[[VNUM_I32:.*]] = arith.index_castui %[[VNUM]] : index to i32
	// CHECK-NEXT: "arm_sme.intr.str"(%[[VNUM_I32]], %[[GEP]]) : (i32, !llvm.ptr) -> ()
	func.func @transfer_write_2d_zero_i8(%arg0 : memref<?x?xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<0> : vector<[16]x[16]xi8>			%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
	vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>			vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
	return			return
	}			}

	// -----			// -----

	// The following tests check the 'vector.transfer_write' -> 'arm_sme.intr.zero'			// The following tests check the 'vector.transfer_write' -> 'arm_sme.intr.zero'
	// lowering only occurs for vector types of correct rank, shape, element size			// lowering only occurs for vector types of correct rank, shape, element size
	// and number of scalable dims.			// and number of scalable dims.

	// CHECK-LABEL: @transfer_write_2d_zero__bad_type			// CHECK-LABEL: @transfer_write_2d_zero__bad_type
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	// CHECK-NOT: arm_sme.intr.zero			// CHECK-NOT: arm_sme.intr.zero
	c-rhodesUnsubmitted Not Done Reply Inline Actions i think we should keep a CHECK-NOT? c-rhodes: i think we should keep a CHECK-NOT?
	awarzynskiAuthorUnsubmitted Done Reply Inline Actions Removed by accident, ta! awarzynski: Removed by accident, ta!
	func.func @transfer_write_2d_zero__bad_type(%arg0 : memref<?x?xi4>) {			func.func @transfer_write_2d_zero__bad_type(%arg0 : memref<?x?xi4>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<0> : vector<[16]x[16]xi4>			%cst = arith.constant dense<0> : vector<[16]x[16]xi4>
	vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi4>, memref<?x?xi4>			vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi4>, memref<?x?xi4>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @transfer_write_2d_zero__bad_shape			// CHECK-LABEL: @transfer_write_2d_zero__bad_shape
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	// CHECK-NOT: arm_sme.intr.zero			// CHECK-NOT: arm_sme.tile_store
	func.func @transfer_write_2d_zero__bad_shape(%arg0 : memref<?x?xi8>) {			func.func @transfer_write_2d_zero__bad_shape(%arg0 : memref<?x?xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<0> : vector<[8]x[8]xi8>			%cst = arith.constant dense<0> : vector<[8]x[8]xi8>
	vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[8]x[8]xi8>, memref<?x?xi8>			vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[8]x[8]xi8>, memref<?x?xi8>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @transfer_write_2d_zero__bad_rank			// CHECK-LABEL: @transfer_write_2d_zero__bad_rank
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	// CHECK-NOT: arm_sme.intr.zero			// CHECK-NOT: arm_sme.tile_store
	func.func @transfer_write_2d_zero__bad_rank(%arg0 : memref<?x?x?xi8>) {			func.func @transfer_write_2d_zero__bad_rank(%arg0 : memref<?x?x?xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<0> : vector<[16]x[16]x[16]xi8>			%cst = arith.constant dense<0> : vector<[16]x[16]x[16]xi8>
	vector.transfer_write %cst, %arg0[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<[16]x[16]x[16]xi8>, memref<?x?x?xi8>			vector.transfer_write %cst, %arg0[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<[16]x[16]x[16]xi8>, memref<?x?x?xi8>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @transfer_write_2d_zero__non_memref_type			// CHECK-LABEL: @transfer_write_2d_zero__non_memref_type
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	// CHECK-NOT: arm_sme.intr.zero			// CHECK-NOT: arm_sme.tile_store
	func.func @transfer_write_2d_zero__non_memref_type(%arg0 : tensor<?x?xi8>) -> tensor<?x?xi8> {			func.func @transfer_write_2d_zero__non_memref_type(%arg0 : tensor<?x?xi8>) -> tensor<?x?xi8> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<0> : vector<[16]x[16]xi8>			%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
	%0 = vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, tensor<?x?xi8>			%0 = vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, tensor<?x?xi8>
	return %0 : tensor<?x?xi8>			return %0 : tensor<?x?xi8>
	}			}

	// -----			// -----

	// CHECK-LABEL: @transfer_write_2d_zero__non_zero_value			// CHECK-LABEL: @transfer_write_2d_zero__non_zero_value
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	// CHECK-NOT: arm_sme.intr.zero			// CHECK-NOT: arm_sme.tile_store
	func.func @transfer_write_2d_zero__non_zero_value(%arg0 : memref<?x?xi8>) {			func.func @transfer_write_2d_zero__non_zero_value(%arg0 : memref<?x?xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<1> : vector<[16]x[16]xi8>			%cst = arith.constant dense<1> : vector<[16]x[16]xi8>
	vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>			vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @transfer_write_2d_zero__vec_unknown_defining_op			// CHECK-LABEL: @transfer_write_2d_zero__vec_unknown_defining_op
	// CHECK: vector.transfer_write			// CHECK: vector.transfer_write
	// CHECK-NOT: arm_sme.intr.zero			// CHECK-NOT: arm_sme.tile_store
	func.func @transfer_write_2d_zero__vec_unknown_defining_op(%arg0 : memref<?x?xi8>, %arg1 : vector<[16]x[16]xi8>) {			func.func @transfer_write_2d_zero__vec_unknown_defining_op(%arg0 : memref<?x?xi8>, %arg1 : vector<[16]x[16]xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.transfer_write %arg1, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>			vector.transfer_write %arg1, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
	return			return
	}			}

mlir/test/Dialect/ArmSME/vector-ops.mlir

This file was moved to mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir.

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir

	// RUN: mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \			// RUN: mlir-opt %s -convert-vector-to-arm-sme -enable-arm-streaming="mode=locally enable-za" \
	// RUN: -convert-vector-to-llvm="enable-arm-sme" -test-lower-to-llvm \| \			// RUN: -convert-vector-to-llvm="enable-arm-sme" -test-lower-to-llvm \| \
	// RUN: mlir-translate -mlir-to-llvmir \| \			// RUN: mlir-translate -mlir-to-llvmir \| \
	// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \			// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \
	// RUN: --entry-function=entry \			// RUN: --entry-function=entry \
	// RUN: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \| \			// RUN: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() -> i32 {			func.func @entry() -> i32 {
	▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ArmSME] Introduce custom ops for SMEClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 541381

mlir/include/mlir/Conversion/Passes.h

mlir/include/mlir/Conversion/Passes.td

mlir/include/mlir/Conversion/VectorToArmSME/VectorToArmSME.h

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.h

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td

mlir/lib/Conversion/CMakeLists.txt

mlir/lib/Conversion/VectorToArmSME/CMakeLists.txt

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp

mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp

mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt

mlir/lib/Dialect/ArmSME/Transforms/CMakeLists.txt

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp

mlir/test/Dialect/ArmSME/roundtrip.mlir

mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir

mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir

mlir/test/Dialect/ArmSME/vector-ops.mlir

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir

[mlir][ArmSME] Introduce custom ops for SME
ClosedPublic