This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add pass to enable Armv9 Streaming SVE mode
ClosedPublic

Authored by c-rhodes on May 18 2023, 11:26 PM.

Details

Summary

This patch adds a pass 'enable-arm-streaming' that enables the Armv9
Scalable Matrix Extension (SME) Streaming SVE (SSVE) mode [1] by adding
either of the following attributes to 'func.func' ops:

  • arm_streaming (default)
  • arm_locally_streaming

PATCH [2 / 2] in series for RFC: https://discourse.llvm.org/t/rfc-supporting-armv9-scalable-matrix-extension-sme-streaming-sve-ssve-mode-in-mlir/70678

[1] https://developer.arm.com/documentation/ddi0616/aa

Diff Detail

Event Timeline

c-rhodes created this revision.May 18 2023, 11:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2023, 11:26 PM
c-rhodes requested review of this revision.May 18 2023, 11:26 PM
awarzynski accepted this revision.May 19 2023, 7:21 AM

Great stuff, thank you! LGTM

This revision is now accepted and ready to land.May 19 2023, 7:21 AM
c-rhodes updated this revision to Diff 523783.May 19 2023, 8:11 AM
c-rhodes edited the summary of this revision. (Show Details)

Updated namespace to remove dependent pass.

Matt added a subscriber: Matt.May 19 2023, 8:24 AM

I'll land this tomorrow unless there's any further comments by then.

dcaballe requested changes to this revision.May 23 2023, 5:58 PM

Thanks for moving forward with the implementation! I think we should move this outside of the Func dialect. Since an SME dialect is in the horizon, I would suggest that we move the implementation to Dialect/SME/Transforms and the lit tests to similar Dialect/SME folders.
We currently don't have a good place for target specific transformations without a dialect and introducing something like {lib/include}/Target/... would probably need much more discussion.

This revision now requires changes to proceed.May 23 2023, 5:58 PM
c-rhodes updated this revision to Diff 525100.May 24 2023, 4:01 AM

Move to ArmSME dialect.

Thanks for moving forward with the implementation! I think we should move this outside of the Func dialect. Since an SME dialect is in the horizon, I would suggest that we move the implementation to Dialect/SME/Transforms and the lit tests to similar Dialect/SME folders.

Thanks for reviewing. I've moved the implementation to a minimal SME dialect.

We currently don't have a good place for target specific transformations without a dialect and introducing something like {lib/include}/Target/... would probably need much more discussion.

I think a single dialect for each target (e.g. Arm, x86, ...) much like target in LLVM backend would be better than having dialects for each target feature. There's a fair amount of boilerplate in these dialects.

dcaballe accepted this revision.May 24 2023, 9:10 AM

LGTM, thanks!

I think a single dialect for each target (e.g. Arm, x86, ...) much like target in LLVM backend would be better than having dialects for each target feature. There's a fair amount of boilerplate in these dialects.

That's fair. Others can correct me if I'm wrong but I think the current "trend" in MLIR (as this is something that has been changing over time) leans more towards more minimalist dialects with a small and well-defined scopes (e.g., 'func' dialect). I think this is, to some extent, motivated by the infra as, for example, registering ops that are not used will have a negative impact in compile time. We also have mechanisms to make a dialect legal/illegal after a conversion which I think also fosters this kind of small logical grouping (e.g., it would be tedious if we had a single ARM dialect and we had to mark as legal/illegal a large part of it, one instruction at a time). There are also clear exceptions to this, such as the Vector dialect, where we have multiple layers of abstraction altogether. I think the idea of "sub-dialects" or some kind of logical grouping within a dialect was discussed at some point. That would also make sense for target-like dialects and their features. At the end of the day, all of this is flexible and it mostly depends on how we use the operations in practice.

mlir/include/mlir/Dialect/CMakeLists.txt
8

alphabetically sorted?

mlir/lib/Dialect/CMakeLists.txt
7

same

This revision is now accepted and ready to land.May 24 2023, 9:10 AM
This revision was landed with ongoing or failed builds.May 25 2023, 2:22 AM
This revision was automatically updated to reflect the committed changes.
c-rhodes marked 2 inline comments as done.

LGTM, thanks!

I think a single dialect for each target (e.g. Arm, x86, ...) much like target in LLVM backend would be better than having dialects for each target feature. There's a fair amount of boilerplate in these dialects.

That's fair. Others can correct me if I'm wrong but I think the current "trend" in MLIR (as this is something that has been changing over time) leans more towards more minimalist dialects with a small and well-defined scopes (e.g., 'func' dialect). I think this is, to some extent, motivated by the infra as, for example, registering ops that are not used will have a negative impact in compile time. We also have mechanisms to make a dialect legal/illegal after a conversion which I think also fosters this kind of small logical grouping (e.g., it would be tedious if we had a single ARM dialect and we had to mark as legal/illegal a large part of it, one instruction at a time). There are also clear exceptions to this, such as the Vector dialect, where we have multiple layers of abstraction altogether. I think the idea of "sub-dialects" or some kind of logical grouping within a dialect was discussed at some point. That would also make sense for target-like dialects and their features. At the end of the day, all of this is flexible and it mostly depends on how we use the operations in practice.

Ah ok, I wasn't aware of most of that, thanks for explaining.