This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ArmSME] Add basic lowering of vector.transfer write to zero
ClosedPublic

Authored by c-rhodes on Jun 9 2023, 2:33 AM.

Details

Summary

This patch adds support for lowering a vector.transfer_write of zeroes
and type vector<[16x16]xi8> to the SME zero {za} instruction [1],
which zeroes the entire accumulator, and then writing it out to memory
with the str instruction [2].

This contributes to supporting a path from linalg.fill to SME.

[1] https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/ZERO--Zero-a-list-of-64-bit-element-ZA-tiles-
[2] https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/STR--Store-vector-from-ZA-array-

Diff Detail

Event Timeline

c-rhodes created this revision.Jun 9 2023, 2:33 AM
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
c-rhodes requested review of this revision.Jun 9 2023, 2:33 AM
Matt added a subscriber: Matt.Jun 9 2023, 10:21 AM

Hi @c-rhodes , thanks for working on this :)

I've made a few suggestions inline. Mostly asking for more documentation and wondering whether this could be trimmed a bit more - there are some references to ArmSME Ops, but none are defined, so perhaps some code is not needed?

Sadly, it doesn't build for me ATM (tried ToT: 15a16ef8e06e):

/llvm-project/mlir/include/mlir/Dialect/ArmSME/IR/ArmSMEDialect.h:21:10: fatal error: 'mlir/Dialect/ArmSME/ArmSMEDialect.h.inc' file not found
#include "mlir/Dialect/ArmSME/ArmSMEDialect.h.inc"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
mlir/include/mlir/Dialect/ArmSME/IR/ArmSMEDialect.h
21 ↗(On Diff #529864)

Is this needed?

24 ↗(On Diff #529864)

If there are no SME Ops then why would this header be needed?

mlir/include/mlir/Dialect/ArmSME/Transforms/Transforms.h
24
  1. There are no ArmSME ops :)
  2. There are no patterns :)

Do we need this hook?

27

[nit] There are no ArmSME ops :)

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp
9

Could you add a Doxygen note to document what is this file intended for?

19

We are missing some documentation :)

Also, it would be good to document that only the i8 case is supported ATM:

  • element size: i8
  • number of tiles: 1
  • tile size: [16x16]xi8

And that this will be extended shortly :)

36

Could you replace 255 with some constant? Otherwise it's a magic number and it's unclear what it means.

c-rhodes updated this revision to Diff 530437.Jun 12 2023, 2:38 AM

Fix build error and address comments.

c-rhodes marked 4 inline comments as done.Jun 12 2023, 2:41 AM

Hi @c-rhodes , thanks for working on this :)

I've made a few suggestions inline. Mostly asking for more documentation and wondering whether this could be trimmed a bit more - there are some references to ArmSME Ops, but none are defined, so perhaps some code is not needed?

Sadly, it doesn't build for me ATM (tried ToT: 15a16ef8e06e):

/llvm-project/mlir/include/mlir/Dialect/ArmSME/IR/ArmSMEDialect.h:21:10: fatal error: 'mlir/Dialect/ArmSME/ArmSMEDialect.h.inc' file not found
#include "mlir/Dialect/ArmSME/ArmSMEDialect.h.inc"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

thanks for reviewing! Apologies for the build error, path was wrong for that .inc (and a few others), it's in the IR directory. Should build now.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSMEDialect.h
21 ↗(On Diff #529864)

Is this needed?

It is yeah that .inc file is generated by tablegen and declares the dialect.

24 ↗(On Diff #529864)

If there are no SME Ops then why would this header be needed?

there's no custom ops but the intrinsic definition LLVM_aarch64_sme_zero is still an op.

mlir/include/mlir/Dialect/ArmSME/Transforms/Transforms.h
24
  1. There are no ArmSME ops :)
  2. There are no patterns :)

Do we need this hook?

there's no custom ops so you're right this isn't needed, removed it.

27

[nit] There are no ArmSME ops :)

The LLVM_aarch64_sme_zero intrinsic definition is as op, it's marked legal in this function.

dcaballe accepted this revision.Jun 12 2023, 11:38 PM

LGTM % ongoing comments. Thanks!

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp
47

Just a direct and simple translation from the Vector dialect... That's great!

mlir/test/Dialect/ArmSME/vector_ops.mlir
18

You can add --split-input-file and add a line with // ----- between each test for them to run independently and in parallel

This revision is now accepted and ready to land.Jun 12 2023, 11:38 PM
c-rhodes updated this revision to Diff 530806.Jun 13 2023, 1:13 AM
c-rhodes marked 4 inline comments as done.
  • Added missing LLVMIR test mlir/test/Target/LLVMIR/arm-sme.mlir.
  • Use -split-input-file.
c-rhodes marked an inline comment as done.Jun 13 2023, 1:14 AM

LGTM % ongoing comments. Thanks!

Thanks for reviewing!

mlir/test/Dialect/ArmSME/vector_ops.mlir
18

You can add --split-input-file and add a line with // ----- between each test for them to run independently and in parallel

Done, thank you!

awarzynski accepted this revision.Jun 13 2023, 6:57 AM

LGTM % ongoing comments. Thanks!

Same. LGTM, thanks Cullen!

c-rhodes reopened this revision.Jun 14 2023, 2:12 AM

Apologies I shouldn't have committed this until there's consensus at the ODM, reverted.

This revision is now accepted and ready to land.Jun 14 2023, 2:12 AM

Apologies I shouldn't have committed this until there's consensus at the ODM, reverted.

Thank you and sorry for the confusion, I should've left a clearer message when approving this.

In general, we do have +1 from Arm (myself), +1 from Google (Diego), but we should also make sure that this works for Huawei (Frank). Or, at least, that this wouldn't block Frank from pursuing the approach taken in https://reviews.llvm.org/D152080 (if that's still the preference after the ODM). In the meantime, we could merge https://reviews.llvm.org/D152878 as that change will be required regardless of whether going via the Vector dialect or not.

Thanks again for all the effort working on this!

we should also make sure that this works for Huawei (Frank)

Thanks for the consideration. I don't think this is in direct conflict with what we want to do, but again, we can discuss further during the ODM.

c-rhodes updated this revision to Diff 531666.Jun 15 2023, 2:42 AM

Rebased now D152878 has landed.

c-rhodes updated this revision to Diff 532109.Jun 16 2023, 5:45 AM

Rebase again now D153050 has landed.

we should also make sure that this works for Huawei (Frank)

Thanks for the consideration. I don't think this is in direct conflict with what we want to do, but again, we can discuss further during the ODM.

@WanderAway Could you confirm that there are no objections from your side after the ODM? Thanks!

WanderAway accepted this revision.Jun 22 2023, 11:12 AM

@WanderAway Could you confirm that there are no objections from your side after the ODM? Thanks!

Yup, no objections here.

c-rhodes updated this revision to Diff 534515.Jun 26 2023, 6:20 AM
c-rhodes edited the summary of this revision. (Show Details)

This patch was pretty basic so I've made some improvements/fixes:

  • Write ZA out to memory after zero {za}.
  • Add integration test (runs on QEMU).
  • Check the vector.transfer_write value is a dense arith.constant of zeroes.
  • Simplify tests in mlir/test/Dialect/ArmSME/vector_ops.mlir to pass memref as argument rather than set it up in each function. Also added 3 more tests that check lowering doesn't happen for:
    • non-memref types.
    • non zero values.
    • vector.transfer_write value op where defining value isn't visible, previously crashed if it was passed as argument.

I was a bit unsure whether to abandon this and post a new patch given this has already been approved and there's a few changes, happy to do that if people prefer.

Thanks for the updates, I've left some comments inline.

I was a bit unsure whether to abandon this and post a new patch given this has already been approved and there's a few changes, happy to do that if people prefer.

IMO we can continue here. You are basically refining the initial design rather than proposing something completely new.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
123

Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well?

mlir/test/Dialect/ArmSME/vector_ops.mlir
24

Doesn't this store the same array vector on each iteration? IIUC, the only thing that's changing is the destination.

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir
12

[nit] %i1 (variable) is easy to confuse with i1 (type). I would use c1 instead. And if you need different types, c1_idx and c1_i32.

18

[nit] It's worth elaborating what svl stands for in this test. And how do we know that it's going to be "streaming vector length" rather than "vector length"?

34

[nit] Did you mean init_1 instead? Similar comment for addition below.

35

[nit] In this case the induction variable has a very specific meaning. Also, it can be confusing that there 16 bytes being loaded, but the induction variable is only increased by 1.

37
39

[nit] Same point as for the previous scf.for

41
51

It would be good to also verify that the result is != 1 when the elements in the matrix are different. Would it be possible to set one element to 123 and verify that the result is 123?

75

Shouldn't this be ... ?

83

It would be good to also verify that the result is != 0 when the elements in the matrix are different. Would it be possible to set one element to 321 and see what happens?

awarzynski added inline comments.Jun 28 2023, 3:56 AM
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir
22–25

I am fine with loops, but not sure about the comment 🤔 .

vector.store - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-sve.mlir#L9-L20

vector.transfer_write - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Dialect/Linalg/vectorization-scalable.mlir#L27

Like I said, loops are fine (I really like the simplicity). I am just curious what exactly is missing :) But we can investigate that independently of this patch.

c-rhodes updated this revision to Diff 536170.Jun 30 2023, 3:21 AM

Rebase and address comments.

c-rhodes marked 16 inline comments as done.Jun 30 2023, 3:28 AM
c-rhodes added inline comments.
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
123

Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well?

Good spot, cheers.

mlir/test/Dialect/ArmSME/vector_ops.mlir
24

Doesn't this store the same array vector on each iteration? IIUC, the only thing that's changing is the destination.

Doh! It does yeah good spot, fixed.

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir
18

... And how do we know that it's going to be "streaming vector length" rather than "vector length"?

streaming-mode is enabled by the -enable-arm-streaming pass.

22–25

I am fine with loops, but not sure about the comment 🤔 .

vector.store - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-sve.mlir#L9-L20

vector.transfer_write - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Dialect/Linalg/vectorization-scalable.mlir#L27

Like I said, loops are fine (I really like the simplicity). I am just curious what exactly is missing :) But we can investigate that independently of this patch.

Apologies, I've updated the comment to reflect the actual problem.

34

[nit] Did you mean init_1 instead? Similar comment for addition below.

It was intended as first init / second init but I can see how that's confusing, updated to your suggestion.

39

[nit] Same point as for the previous scf.for

not sure col is applicable here, changed it to (row) offset.

75

Shouldn't this be ... ?

it should. Good spot.

awarzynski accepted this revision.Jul 3 2023, 2:24 AM

Thanks for addressing my comments! I've tested locally and can confirm that the integration test runs correctly 🎉. Great job, LGTM!

Just a few final nits that you can either ignore or address when merging.

Btw, most test files use hyphen "-" rather than "_" underscore: "vector_ops.mlir" --> "vector-ops.mlir"?

As this remains within the scope of the original submission, I think it's fine to merge without waiting for the other reviewers to confirm (I'm just being mindful that this change has evolved since being originally OK'ed).

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp
41

Just a nit. I feel that it would be good to make it clear, consistently, that these are "virtual" SME tiles.

52

Replace 16 with a constant (it's repeated a few times).

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir
37–38

IMHO, readability would be better without wrapping lines like this one. And 80-char limit is rarely observed in tests. This is a nit ;-)

83

A comment might make it easier for our future selves to figure out where the "magic" 60 comes from :) This is a nit.

This revision was automatically updated to reflect the committed changes.
c-rhodes marked 7 inline comments as done.
c-rhodes marked 4 inline comments as done.Jul 3 2023, 3:35 AM

Thanks for addressing my comments! I've tested locally and can confirm that the integration test runs correctly 🎉. Great job, LGTM!

Just a few final nits that you can either ignore or address when merging.

Btw, most test files use hyphen "-" rather than "_" underscore: "vector_ops.mlir" --> "vector-ops.mlir"?

As this remains within the scope of the original submission, I think it's fine to merge without waiting for the other reviewers to confirm (I'm just being mindful that this change has evolved since being originally OK'ed).

Thanks for reviewing again! Addressed all your comments before committing. Cheers.