This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/ArmSME/
-
mlir/
-
Dialect/
-
ArmSME/
-
IR/
-
ArmSME.h
2/2
ArmSME.td
-
Transforms/
4/4
Transforms.h
-
lib/
-
Conversion/VectorToLLVM/
-
VectorToLLVM/
-
ConvertVectorToLLVMPass.cpp
-
Dialect/ArmSME/
-
ArmSME/
-
IR/
-
CMakeLists.txt
-
Transforms/
-
CMakeLists.txt
-
LegalizeForLLVMExport.cpp
5/6
LowerVectorOps.cpp
-
test/
-
Dialect/ArmSME/
-
ArmSME/
-
vector-ops.mlir
-
Integration/Dialect/Vector/CPU/ArmSME/
-
Dialect/
-
Vector/
-
CPU/
-
ArmSME/
-
vector-ops.mlir
-
Target/LLVMIR/
-
LLVMIR/
-
arm-sme.mlir

Differential D152508

[mlir][ArmSME] Add basic lowering of vector.transfer write to zero
ClosedPublic

Authored by c-rhodes on Jun 9 2023, 2:33 AM.

Download Raw Diff

Details

Reviewers

awarzynski
dcaballe
WanderAway
aartbik
ftynse
nicolasvasilache

Commits

rG564713c47175: [mlir][ArmSME] Add basic lowering of vector.transfer_write to zero
rGa48fe898857c: [mlir][ArmSME] Add initial dialect with basic lowering of vector.transfer write…

Summary

This patch adds support for lowering a vector.transfer_write of zeroes
and type vector<[16x16]xi8> to the SME zero {za} instruction [1],
which zeroes the entire accumulator, and then writing it out to memory
with the str instruction [2].

This contributes to supporting a path from linalg.fill to SME.

[1] https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/ZERO--Zero-a-list-of-64-bit-element-ZA-tiles-
[2] https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/STR--Store-vector-from-ZA-array-

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

c-rhodes created this revision.Jun 9 2023, 2:33 AM

Herald added a reviewer: aartbik. · View Herald TranscriptJun 9 2023, 2:33 AM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, Moerafaat, zero9178 and 24 others. · View Herald Transcript

c-rhodes requested review of this revision.Jun 9 2023, 2:33 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJun 9 2023, 2:33 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B237701: Diff 529864.Jun 9 2023, 2:47 AM

c-rhodes mentioned this in D152080: [MLIR][ArmSME] Initial SME Dialect Implementation.Jun 9 2023, 3:01 AM

Matt added a subscriber: Matt.Jun 9 2023, 10:21 AM

Hi @c-rhodes , thanks for working on this :)

I've made a few suggestions inline. Mostly asking for more documentation and wondering whether this could be trimmed a bit more - there are some references to ArmSME Ops, but none are defined, so perhaps some code is not needed?

Sadly, it doesn't build for me ATM (tried ToT: 15a16ef8e06e):

/llvm-project/mlir/include/mlir/Dialect/ArmSME/IR/ArmSMEDialect.h:21:10: fatal error: 'mlir/Dialect/ArmSME/ArmSMEDialect.h.inc' file not found
#include "mlir/Dialect/ArmSME/ArmSMEDialect.h.inc"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSMEDialect.h
21 ↗	(On Diff #529864)	Is this needed?
24 ↗	(On Diff #529864)	If there are no SME Ops then why would this header be needed?
mlir/include/mlir/Dialect/ArmSME/Transforms/Transforms.h
24	There are no ArmSME ops :) There are no patterns :) Do we need this hook?
27	[nit] There are no ArmSME ops :)
mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp
9	Could you add a Doxygen note to document what is this file intended for?
19	We are missing some documentation :) Also, it would be good to document that only the `i8` case is supported ATM: element size: `i8` number of tiles: 1 tile size: `[16x16]xi8` And that this will be extended shortly :)
36	Could you replace `255` with some constant? Otherwise it's a magic number and it's unclear what it means.

Fix build error and address comments.

In D152508#4412985, @awarzynski wrote:
Hi @c-rhodes , thanks for working on this :)

I've made a few suggestions inline. Mostly asking for more documentation and wondering whether this could be trimmed a bit more - there are some references to ArmSME Ops, but none are defined, so perhaps some code is not needed?

Sadly, it doesn't build for me ATM (tried ToT: 15a16ef8e06e):
/llvm-project/mlir/include/mlir/Dialect/ArmSME/IR/ArmSMEDialect.h:21:10: fatal error: 'mlir/Dialect/ArmSME/ArmSMEDialect.h.inc' file not found
#include "mlir/Dialect/ArmSME/ArmSMEDialect.h.inc"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

thanks for reviewing! Apologies for the build error, path was wrong for that .inc (and a few others), it's in the IR directory. Should build now.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSMEDialect.h
21 ↗	(On Diff #529864)	Is this needed? It is yeah that .inc file is generated by tablegen and declares the dialect.
24 ↗	(On Diff #529864)	If there are no SME Ops then why would this header be needed? there's no custom ops but the intrinsic definition `LLVM_aarch64_sme_zero` is still an op.
mlir/include/mlir/Dialect/ArmSME/Transforms/Transforms.h
24	There are no ArmSME ops :) There are no patterns :) Do we need this hook? there's no custom ops so you're right this isn't needed, removed it.
27	[nit] There are no ArmSME ops :) The `LLVM_aarch64_sme_zero` intrinsic definition is as op, it's marked legal in this function.

Harbormaster completed remote builds in B238138: Diff 530437.Jun 12 2023, 3:07 AM

c-rhodes mentioned this in D152695: [mlir][ArmSME] Extend streaming-mode pass to support enabling ZA.Jun 12 2023, 4:06 AM

LGTM % ongoing comments. Thanks!

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp
47	Just a direct and simple translation from the Vector dialect... That's great!
mlir/test/Dialect/ArmSME/vector_ops.mlir
17 ↗	(On Diff #530437)	You can add `--split-input-file` and add a line with `// -----` between each test for them to run independently and in parallel

This revision is now accepted and ready to land.Jun 12 2023, 11:38 PM

Added missing LLVMIR test mlir/test/Target/LLVMIR/arm-sme.mlir.
Use -split-input-file.

In D152508#4416400, @dcaballe wrote:

LGTM % ongoing comments. Thanks!

Thanks for reviewing!

mlir/test/Dialect/ArmSME/vector_ops.mlir
17 ↗	(On Diff #530437)	You can add `--split-input-file` and add a line with `// -----` between each test for them to run independently and in parallel Done, thank you!

Harbormaster completed remote builds in B238415: Diff 530806.Jun 13 2023, 2:03 AM

LGTM % ongoing comments. Thanks!

Same. LGTM, thanks Cullen!

Closed by commit rGa48fe898857c: [mlir][ArmSME] Add initial dialect with basic lowering of vector.transfer write… (authored by c-rhodes). · Explain WhyJun 14 2023, 1:47 AM

This revision was automatically updated to reflect the committed changes.

c-rhodes marked an inline comment as done.

c-rhodes added a commit: rGa48fe898857c: [mlir][ArmSME] Add initial dialect with basic lowering of vector.transfer write….

c-rhodes added a reverting change: rG1e41a29d739a: Revert "[mlir][ArmSME] Add initial dialect with basic lowering of vector..Jun 14 2023, 2:10 AM

Apologies I shouldn't have committed this until there's consensus at the ODM, reverted.

This revision is now accepted and ready to land.Jun 14 2023, 2:12 AM

In D152508#4420362, @c-rhodes wrote:

Apologies I shouldn't have committed this until there's consensus at the ODM, reverted.

Thank you and sorry for the confusion, I should've left a clearer message when approving this.

In general, we do have +1 from Arm (myself), +1 from Google (Diego), but we should also make sure that this works for Huawei (Frank). Or, at least, that this wouldn't block Frank from pursuing the approach taken in https://reviews.llvm.org/D152080 (if that's still the preference after the ODM). In the meantime, we could merge https://reviews.llvm.org/D152878 as that change will be required regardless of whether going via the Vector dialect or not.

Thanks again for all the effort working on this!

we should also make sure that this works for Huawei (Frank)

Thanks for the consideration. I don't think this is in direct conflict with what we want to do, but again, we can discuss further during the ODM.

Rebased now D152878 has landed.

Harbormaster completed remote builds in B239064: Diff 531666.Jun 15 2023, 3:38 AM

c-rhodes mentioned this in rGe947e760585c: [mlir][ArmSME] Extend streaming-mode pass to support enabling ZA.Jun 16 2023, 2:27 AM

Rebase again now D153050 has landed.

Harbormaster completed remote builds in B239387: Diff 532109.Jun 16 2023, 5:58 AM

In D152508#4421516, @WanderAway wrote:

we should also make sure that this works for Huawei (Frank)

Thanks for the consideration. I don't think this is in direct conflict with what we want to do, but again, we can discuss further during the ODM.

@WanderAway Could you confirm that there are no objections from your side after the ODM? Thanks!

Herald added subscribers: gysit, Dinistro. · View Herald TranscriptJun 22 2023, 11:09 AM

@WanderAway Could you confirm that there are no objections from your side after the ODM? Thanks!

Yup, no objections here.

This patch was pretty basic so I've made some improvements/fixes:

Write ZA out to memory after zero {za}.
Add integration test (runs on QEMU).
Check the vector.transfer_write value is a dense arith.constant of zeroes.
Simplify tests in mlir/test/Dialect/ArmSME/vector_ops.mlir to pass memref as argument rather than set it up in each function. Also added 3 more tests that check lowering doesn't happen for:
- non-memref types.
- non zero values.
- vector.transfer_write value op where defining value isn't visible, previously crashed if it was passed as argument.

I was a bit unsure whether to abandon this and post a new patch given this has already been approved and there's a few changes, happy to do that if people prefer.

Harbormaster completed remote builds in B241150: Diff 534515.Jun 26 2023, 6:35 AM

Thanks for the updates, I've left some comments inline.

I was a bit unsure whether to abandon this and post a new patch given this has already been approved and there's a few changes, happy to do that if people prefer.

IMO we can continue here. You are basically refining the initial design rather than proposing something completely new.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
123	Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well?
mlir/test/Dialect/ArmSME/vector_ops.mlir
23 ↗	(On Diff #534515)	Doesn't this store the same array vector on each iteration? IIUC, the only thing that's changing is the destination.
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir
11 ↗	(On Diff #534515)	[nit] `%i1` (variable) is easy to confuse with `i1` (type). I would use `c1` instead. And if you need different types, `c1_idx` and `c1_i32`.
17 ↗	(On Diff #534515)	[nit] It's worth elaborating what `svl` stands for in this test. And how do we know that it's going to be "streaming vector length" rather than "vector length"?
33 ↗	(On Diff #534515)	[nit] Did you mean `init_1` instead? Similar comment for addition below.
34 ↗	(On Diff #534515)	[nit] In this case the induction variable has a very specific meaning. Also, it can be confusing that there 16 bytes being loaded, but the induction variable is only increased by 1.
36 ↗	(On Diff #534515)
38 ↗	(On Diff #534515)	[nit] Same point as for the previous `scf.for`
40 ↗	(On Diff #534515)
50 ↗	(On Diff #534515)	It would be good to also verify that the result is != 1 when the elements in the matrix are different. Would it be possible to set one element to 123 and verify that the result is 123?
74 ↗	(On Diff #534515)	Shouldn't this be ... ?
82 ↗	(On Diff #534515)	It would be good to also verify that the result is != 0 when the elements in the matrix are different. Would it be possible to set one element to 321 and see what happens?

awarzynski added inline comments.Jun 28 2023, 3:56 AM

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir
21–24 ↗	(On Diff #534515)	I am fine with loops, but not sure about the comment 🤔 . `vector.store` - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-sve.mlir#L9-L20 `vector.transfer_write` - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Dialect/Linalg/vectorization-scalable.mlir#L27 Like I said, loops are fine (I really like the simplicity). I am just curious what exactly is missing :) But we can investigate that independently of this patch.

Rebase and address comments.

c-rhodes marked 16 inline comments as done.Jun 30 2023, 3:28 AM

c-rhodes added inline comments.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
123	Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well? Good spot, cheers.
mlir/test/Dialect/ArmSME/vector_ops.mlir
23 ↗	(On Diff #534515)	Doesn't this store the same array vector on each iteration? IIUC, the only thing that's changing is the destination. Doh! It does yeah good spot, fixed.
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir
17 ↗	(On Diff #534515)	... And how do we know that it's going to be "streaming vector length" rather than "vector length"? streaming-mode is enabled by the `-enable-arm-streaming` pass.
21–24 ↗	(On Diff #534515)	I am fine with loops, but not sure about the comment 🤔 . `vector.store` - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-sve.mlir#L9-L20 `vector.transfer_write` - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Dialect/Linalg/vectorization-scalable.mlir#L27 Like I said, loops are fine (I really like the simplicity). I am just curious what exactly is missing :) But we can investigate that independently of this patch. Apologies, I've updated the comment to reflect the actual problem.
33 ↗	(On Diff #534515)	[nit] Did you mean `init_1` instead? Similar comment for addition below. It was intended as first init / second init but I can see how that's confusing, updated to your suggestion.
38 ↗	(On Diff #534515)	[nit] Same point as for the previous `scf.for` not sure col is applicable here, changed it to (row) `offset`.
74 ↗	(On Diff #534515)	Shouldn't this be ... ? it should. Good spot.

Harbormaster completed remote builds in B242366: Diff 536170.Jun 30 2023, 3:39 AM

c-rhodes mentioned this in D154302: [mlir][nfc] Clarify the limitation on scalable vectors.Jul 3 2023, 1:53 AM

Thanks for addressing my comments! I've tested locally and can confirm that the integration test runs correctly 🎉. Great job, LGTM!

Just a few final nits that you can either ignore or address when merging.

Btw, most test files use hyphen "-" rather than "_" underscore: "vector_ops.mlir" --> "vector-ops.mlir"?

As this remains within the scope of the original submission, I think it's fine to merge without waiting for the other reviewers to confirm (I'm just being mindful that this change has evolved since being originally OK'ed).

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp
42	Just a nit. I feel that it would be good to make it clear, consistently, that these are "virtual" SME tiles.
53	Replace 16 with a constant (it's repeated a few times).
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir
37–38 ↗	(On Diff #536170)	IMHO, readability would be better without wrapping lines like this one. And 80-char limit is rarely observed in tests. This is a nit ;-)
83 ↗	(On Diff #536170)	A comment might make it easier for our future selves to figure out where the "magic" 60 comes from :) This is a nit.

Closed by commit rG564713c47175: [mlir][ArmSME] Add basic lowering of vector.transfer_write to zero (authored by c-rhodes). · Explain WhyJul 3 2023, 3:26 AM

This revision was automatically updated to reflect the committed changes.

c-rhodes marked 7 inline comments as done.

c-rhodes added a commit: rG564713c47175: [mlir][ArmSME] Add basic lowering of vector.transfer_write to zero.

In D152508#4467879, @awarzynski wrote:

Thanks for addressing my comments! I've tested locally and can confirm that the integration test runs correctly 🎉. Great job, LGTM!

Just a few final nits that you can either ignore or address when merging.

Btw, most test files use hyphen "-" rather than "_" underscore: "vector_ops.mlir" --> "vector-ops.mlir"?

As this remains within the scope of the original submission, I think it's fine to merge without waiting for the other reviewers to confirm (I'm just being mindful that this change has evolved since being originally OK'ed).

Thanks for reviewing again! Addressed all your comments before committing. Cheers.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

ArmSME/

IR/

ArmSME.h

1 line

ArmSME.td

6 lines

Transforms/

Transforms.h

5 lines

lib/

Conversion/

VectorToLLVM/

ConvertVectorToLLVMPass.cpp

1 line

Dialect/

ArmSME/

IR/

CMakeLists.txt

1 line

Transforms/

CMakeLists.txt

3 lines

LegalizeForLLVMExport.cpp

4 lines

LowerVectorOps.cpp

111 lines

test/

Dialect/

ArmSME/

vector-ops.mlir

104 lines

Integration/

Dialect/

Vector/

CPU/

ArmSME/

vector-ops.mlir

142 lines

Target/

LLVMIR/

arm-sme.mlir

2 lines

Diff 536703

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.h

	//===- ArmSMEDialect.h - MLIR Dialect for Arm SME ---------------- C++ --===//			//===- ArmSMEDialect.h - MLIR Dialect for Arm SME ---------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file declares the Target dialect for ArmSME in MLIR.			// This file declares the Target dialect for ArmSME in MLIR.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_DIALECT_ARMSME_IR_ARMSME_H			#ifndef MLIR_DIALECT_ARMSME_IR_ARMSME_H
	#define MLIR_DIALECT_ARMSME_IR_ARMSME_H			#define MLIR_DIALECT_ARMSME_IR_ARMSME_H

	#include "mlir/Bytecode/BytecodeOpInterface.h"			#include "mlir/Bytecode/BytecodeOpInterface.h"
				#include "mlir/Dialect/SCF/IR/SCF.h"
	#include "mlir/IR/BuiltinTypes.h"			#include "mlir/IR/BuiltinTypes.h"
	#include "mlir/IR/Dialect.h"			#include "mlir/IR/Dialect.h"
	#include "mlir/IR/OpDefinition.h"			#include "mlir/IR/OpDefinition.h"
	#include "mlir/Interfaces/SideEffectInterfaces.h"			#include "mlir/Interfaces/SideEffectInterfaces.h"

	#include "mlir/Dialect/ArmSME/IR/ArmSMEDialect.h.inc"			#include "mlir/Dialect/ArmSME/IR/ArmSMEDialect.h.inc"

	#define GET_OP_CLASSES			#define GET_OP_CLASSES
	#include "mlir/Dialect/ArmSME/IR/ArmSME.h.inc"			#include "mlir/Dialect/ArmSME/IR/ArmSME.h.inc"

	#endif // MLIR_DIALECT_ARMSME_IR_ARMSME_H			#endif // MLIR_DIALECT_ARMSME_IR_ARMSME_H

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td

Show All 27 Lines	def ArmSME_Dialect : Dialect {
let description = [{		let description = [{
This dialect contains the definitions necessary to target Arm SME		This dialect contains the definitions necessary to target Arm SME
scalable matrix operations.		scalable matrix operations.

Sources:		Sources:
https://developer.arm.com/documentation/ddi0616		https://developer.arm.com/documentation/ddi0616
https://developer.arm.com/documentation/ddi0602/2023-03/SME-Instructions		https://developer.arm.com/documentation/ddi0602/2023-03/SME-Instructions
}];		}];
		let dependentDialects = ["scf::SCFDialect"];
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ArmSME Intrinsic op definitions		// ArmSME Intrinsic op definitions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def MOPPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2], [I1]>;		def MOPPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2], [I1]>;
def MOPVector : ScalableVectorOfLengthAndType<[16, 8, 4, 2],		def MOPVector : ScalableVectorOfLengthAndType<[16, 8, 4, 2],
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
def LLVM_aarch64_sme_st1d_horiz : ArmSME_IntrStoreOp<"st1d.horiz">;		def LLVM_aarch64_sme_st1d_horiz : ArmSME_IntrStoreOp<"st1d.horiz">;
def LLVM_aarch64_sme_st1q_horiz : ArmSME_IntrStoreOp<"st1q.horiz">;		def LLVM_aarch64_sme_st1q_horiz : ArmSME_IntrStoreOp<"st1q.horiz">;
def LLVM_aarch64_sme_st1b_vert : ArmSME_IntrStoreOp<"st1b.vert">;		def LLVM_aarch64_sme_st1b_vert : ArmSME_IntrStoreOp<"st1b.vert">;
def LLVM_aarch64_sme_st1h_vert : ArmSME_IntrStoreOp<"st1h.vert">;		def LLVM_aarch64_sme_st1h_vert : ArmSME_IntrStoreOp<"st1h.vert">;
def LLVM_aarch64_sme_st1w_vert : ArmSME_IntrStoreOp<"st1w.vert">;		def LLVM_aarch64_sme_st1w_vert : ArmSME_IntrStoreOp<"st1w.vert">;
def LLVM_aarch64_sme_st1d_vert : ArmSME_IntrStoreOp<"st1d.vert">;		def LLVM_aarch64_sme_st1d_vert : ArmSME_IntrStoreOp<"st1d.vert">;
def LLVM_aarch64_sme_st1q_vert : ArmSME_IntrStoreOp<"st1q.vert">;		def LLVM_aarch64_sme_st1q_vert : ArmSME_IntrStoreOp<"st1q.vert">;

		def LLVM_aarch64_sme_str
		awarzynskiUnsubmitted Done Reply Inline Actions Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well? awarzynski: Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well?
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well? Good spot, cheers. c-rhodes: > Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well? Good spot, cheers.
		: ArmSME_IntrOp<"str">,
		Arguments<(ins Arg<I32, "Index">,
		Arg<LLVM_AnyPointer, "Store address", [MemWrite]>)>;

def LLVM_aarch64_sme_za_enable : ArmSME_IntrOp<"za.enable">;		def LLVM_aarch64_sme_za_enable : ArmSME_IntrOp<"za.enable">;
def LLVM_aarch64_sme_za_disable : ArmSME_IntrOp<"za.disable">;		def LLVM_aarch64_sme_za_disable : ArmSME_IntrOp<"za.disable">;

#endif // ARMSME_OPS		#endif // ARMSME_OPS

mlir/include/mlir/Dialect/ArmSME/Transforms/Transforms.h

	Show All 9 Lines
	#define MLIR_DIALECT_ARMSME_TRANSFORMS_H			#define MLIR_DIALECT_ARMSME_TRANSFORMS_H

	namespace mlir {			namespace mlir {

	class LLVMConversionTarget;			class LLVMConversionTarget;
	class LLVMTypeConverter;			class LLVMTypeConverter;
	class RewritePatternSet;			class RewritePatternSet;

				namespace arm_sme {
				void populateVectorTransferLoweringPatterns(LLVMTypeConverter &converter,
				RewritePatternSet &patterns);
				} // namespace arm_sme

	/// Collect a set of patterns to lower ArmSME ops to ops that map to LLVM			/// Collect a set of patterns to lower ArmSME ops to ops that map to LLVM
	/// intrinsics.			/// intrinsics.
				awarzynskiUnsubmitted Done Reply Inline Actions There are no ArmSME ops :) There are no patterns :) Do we need this hook? awarzynski: 1. There are no ArmSME ops :) 2. There are no patterns :) Do we need this hook?
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions There are no ArmSME ops :) There are no patterns :) Do we need this hook? there's no custom ops so you're right this isn't needed, removed it. c-rhodes: > 1. There are no ArmSME ops :) > 2. There are no patterns :) > > Do we need this hook?
	void populateArmSMELegalizeForLLVMExportPatterns(LLVMTypeConverter &converter,			void populateArmSMELegalizeForLLVMExportPatterns(LLVMTypeConverter &converter,
	RewritePatternSet &patterns);			RewritePatternSet &patterns);

				awarzynskiUnsubmitted Done Reply Inline Actions [nit] There are no ArmSME ops :) awarzynski: [nit] There are no ArmSME ops :)
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions [nit] There are no ArmSME ops :) The `LLVM_aarch64_sme_zero` intrinsic definition is as op, it's marked legal in this function. c-rhodes: > [nit] There are no ArmSME ops :) The `LLVM_aarch64_sme_zero` intrinsic definition is as op…
	/// Configure the target to support lowering ArmSME ops to ops that map to LLVM			/// Configure the target to support lowering ArmSME ops to ops that map to LLVM
	/// intrinsics.			/// intrinsics.
	void configureArmSMELegalizeForExportTarget(LLVMConversionTarget &target);			void configureArmSMELegalizeForExportTarget(LLVMConversionTarget &target);

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_ARMSME_TRANSFORMS_H			#endif // MLIR_DIALECT_ARMSME_TRANSFORMS_H

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	void LowerVectorToLLVMPass::runOnOperation() {
}		}
if (armSVE) {		if (armSVE) {
configureArmSVELegalizeForExportTarget(target);		configureArmSVELegalizeForExportTarget(target);
populateArmSVELegalizeForLLVMExportPatterns(converter, patterns);		populateArmSVELegalizeForLLVMExportPatterns(converter, patterns);
}		}
if (armSME) {		if (armSME) {
configureArmSMELegalizeForExportTarget(target);		configureArmSMELegalizeForExportTarget(target);
populateArmSMELegalizeForLLVMExportPatterns(converter, patterns);		populateArmSMELegalizeForLLVMExportPatterns(converter, patterns);
		arm_sme::populateVectorTransferLoweringPatterns(converter, patterns);
}		}
if (amx) {		if (amx) {
configureAMXLegalizeForExportTarget(target);		configureAMXLegalizeForExportTarget(target);
populateAMXLegalizeForLLVMExportPatterns(converter, patterns);		populateAMXLegalizeForLLVMExportPatterns(converter, patterns);
}		}
if (x86Vector) {		if (x86Vector) {
configureX86VectorLegalizeForExportTarget(target);		configureX86VectorLegalizeForExportTarget(target);
populateX86VectorLegalizeForLLVMExportPatterns(converter, patterns);		populateX86VectorLegalizeForLLVMExportPatterns(converter, patterns);
}		}

if (failed(		if (failed(
applyPartialConversion(getOperation(), target, std::move(patterns))))		applyPartialConversion(getOperation(), target, std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}

mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt

	add_mlir_dialect_library(MLIRArmSMEDialect			add_mlir_dialect_library(MLIRArmSMEDialect
	ArmSME.cpp			ArmSME.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME

	DEPENDS			DEPENDS
	MLIRArmSMEIncGen			MLIRArmSMEIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRIR			MLIRIR
	MLIRLLVMDialect			MLIRLLVMDialect
				MLIRSCFDialect
	MLIRSideEffectInterfaces			MLIRSideEffectInterfaces
	)			)

mlir/lib/Dialect/ArmSME/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRArmSMETransforms			add_mlir_dialect_library(MLIRArmSMETransforms
	EnableArmStreaming.cpp			EnableArmStreaming.cpp
	LegalizeForLLVMExport.cpp			LegalizeForLLVMExport.cpp
				LowerVectorOps.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME/Transforms			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME/Transforms

	DEPENDS			DEPENDS
	MLIRArmSMETransformsIncGen			MLIRArmSMETransformsIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRArmSMEDialect			MLIRArmSMEDialect
	MLIRFuncDialect			MLIRFuncDialect
	MLIRLLVMCommonConversion			MLIRLLVMCommonConversion
				MLIRVectorDialect
				MLIRSCFDialect
	MLIRPass			MLIRPass
	)			)

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp

	//===- LegalizeForLLVMExport.cpp - Prepare ArmSME for LLVM translation ----===//			//===- LegalizeForLLVMExport.cpp - Prepare ArmSME for LLVM translation ----===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"			#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
	#include "mlir/Dialect/ArmSME/IR/ArmSME.h"			#include "mlir/Dialect/ArmSME/IR/ArmSME.h"
	#include "mlir/Dialect/ArmSME/Transforms/Transforms.h"			#include "mlir/Dialect/ArmSME/Transforms/Transforms.h"
	#include "mlir/Dialect/Func/IR/FuncOps.h"			#include "mlir/Dialect/Func/IR/FuncOps.h"
				#include "mlir/Dialect/SCF/IR/SCF.h"

	using namespace mlir;			using namespace mlir;
	using namespace mlir::arm_sme;			using namespace mlir::arm_sme;

	namespace {			namespace {
	/// Insert 'llvm.aarch64.sme.za.enable' intrinsic at the start of 'func.func'			/// Insert 'llvm.aarch64.sme.za.enable' intrinsic at the start of 'func.func'
	/// ops to enable the ZA storage array.			/// ops to enable the ZA storage array.
	struct EnableZAPattern : public OpRewritePattern<func::FuncOp> {			struct EnableZAPattern : public OpRewritePattern<func::FuncOp> {
	Show All 25 Lines

	void mlir::populateArmSMELegalizeForLLVMExportPatterns(			void mlir::populateArmSMELegalizeForLLVMExportPatterns(
	LLVMTypeConverter &converter, RewritePatternSet &patterns) {			LLVMTypeConverter &converter, RewritePatternSet &patterns) {
	patterns.add<EnableZAPattern, DisableZAPattern>(patterns.getContext());			patterns.add<EnableZAPattern, DisableZAPattern>(patterns.getContext());
	}			}

	void mlir::configureArmSMELegalizeForExportTarget(			void mlir::configureArmSMELegalizeForExportTarget(
	LLVMConversionTarget &target) {			LLVMConversionTarget &target) {
	target.addLegalOp<arm_sme::aarch64_sme_za_enable,			target.addLegalOp<scf::ForOp, scf::YieldOp, arm_sme::aarch64_sme_zero,
				arm_sme::aarch64_sme_str, arm_sme::aarch64_sme_za_enable,
	arm_sme::aarch64_sme_za_disable>();			arm_sme::aarch64_sme_za_disable>();

	// Mark 'func.func' ops as legal if either:			// Mark 'func.func' ops as legal if either:
	// 1. no 'arm_za' function attribute is present.			// 1. no 'arm_za' function attribute is present.
	// 2. the 'arm_za' function attribute is present and the first op in the			// 2. the 'arm_za' function attribute is present and the first op in the
	// function is an 'arm_sme::aarch64_sme_za_enable' intrinsic.			// function is an 'arm_sme::aarch64_sme_za_enable' intrinsic.
	target.addDynamicallyLegalOp<func::FuncOp>([&](func::FuncOp funcOp) {			target.addDynamicallyLegalOp<func::FuncOp>([&](func::FuncOp funcOp) {
	if (funcOp.isDeclaration())			if (funcOp.isDeclaration())
	Show All 18 Lines

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp

This file was added.

//===- LowerVectorOps.cpp - Lower vector ops to SME -----------------------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// This file implements rewrite patterns to lower vector dialect ops to ArmSME.

awarzynskiUnsubmitted

Done

Could you add a Doxygen note to document what is this file intended for?

awarzynski: Could you add a Doxygen note to document what is this file intended for?

//===----------------------------------------------------------------------===//

#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"

#include "mlir/Conversion/LLVMCommon/Pattern.h"

#include "mlir/Dialect/Arith/IR/Arith.h"

#include "mlir/Dialect/ArmSME/IR/ArmSME.h"

#include "mlir/Dialect/ArmSME/Transforms/Transforms.h"

#include "mlir/Dialect/LLVMIR/LLVMDialect.h"

#include "mlir/Dialect/SCF/IR/SCF.h"

awarzynskiUnsubmitted

Done

We are missing some documentation :)

Also, it would be good to document that only the i8 case is supported ATM:

element size: i8
number of tiles: 1
tile size: [16x16]xi8

And that this will be extended shortly :)

awarzynski: We are missing some documentation :) Also, it would be good to document that only the `i8`…

#include "mlir/Dialect/Vector/IR/VectorOps.h"

#include "mlir/IR/BuiltinOps.h"

#include "mlir/IR/PatternMatch.h"

using namespace mlir;

using namespace mlir::arm_sme;

static constexpr unsigned kMinNumElts = 16;

static constexpr unsigned kZeroZAMask = 255;

/// Returns true if 'val' is a splat of zero, false otherwise.

static bool isSplatZero(Type elemType, DenseElementsAttr val) {

if (llvm::isa<FloatType>(elemType))

return val && val.isSplat() && val.getSplatValue<APFloat>().isZero();

if (llvm::isa<IntegerType>(elemType))

return val && val.isSplat() && val.getSplatValue<APInt>().isZero();

return false;

awarzynskiUnsubmitted

Done

Could you replace 255 with some constant? Otherwise it's a magic number and it's unclear what it means.

awarzynski: Could you replace `255` with some constant? Otherwise it's a magic number and it's unclear what…

}

namespace {

/// Lower 'vector.transfer_write' op to 'arm_sme.intr.zero' op. Currently only

/// supports 2d scalable vector type 'vector<[16x16]xi8>' that maps to the ZA0.B

/// SME virtual tile. This will be extended to support more element types.

awarzynskiUnsubmitted

Done

/// supports 2d scalable vector type 'vector<[16x16]xi8>' that maps to the ZA0.B

- /// SME tile. This will be extended to support more element types.

+ /// SME virtual tile. This will be extended to support more element types.

struct TransferWriteToArmSMEZeroLowering

Just a nit. I feel that it would be good to make it clear, consistently, that these are "virtual" SME tiles.

awarzynski: Just a nit. I feel that it would be good to make it clear, consistently, that these are…

struct TransferWriteToArmSMEZeroLowering

: public ConvertOpToLLVMPattern<vector::TransferWriteOp> {

using ConvertOpToLLVMPattern<vector::TransferWriteOp>::ConvertOpToLLVMPattern;

LogicalResult

dcaballeUnsubmitted

Not Done

Just a direct and simple translation from the Vector dialect... That's great!

dcaballe: Just a direct and simple translation from the Vector dialect... That's great!

matchAndRewrite(vector::TransferWriteOp write, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override {

auto vType = write.getVectorType();

if (vType.getRank() != 2)

return failure();

if (vType.getShape() != ArrayRef<int64_t>({kMinNumElts, kMinNumElts}))

awarzynskiUnsubmitted

Done

Replace 16 with a constant (it's repeated a few times).

awarzynski: Replace 16 with a constant (it's repeated a few times).

return failure();

if (vType.getElementType() != rewriter.getI8Type())

return failure();

if (vType.getScalableDims().size() != 2)

return failure();

auto memRefType = llvm::dyn_cast<MemRefType>(write.getSource().getType());

if (!memRefType)

return failure();

auto constant = write.getVector().getDefiningOp<arith::ConstantOp>();

if (!constant)

return failure();

auto denseAttr = dyn_cast<DenseElementsAttr>(constant.getValueAttr());

if (!denseAttr || !isSplatZero(vType.getElementType(), denseAttr))

return failure();

auto loc = write.getLoc();

// Create 'arm_sme.intr.zero' intrinsic to zero ZA.

auto tile = rewriter.create<arith::ConstantOp>(

loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(kZeroZAMask));

rewriter.create<arm_sme::aarch64_sme_zero>(loc, tile);

// Create loop that iterates from 0 to SVLB-1 inclusive (the number of

// vectors in ZA) and stores each ZA vector to memory.

auto step = rewriter.create<arith::ConstantIndexOp>(loc, 1);

auto minElems = rewriter.create<arith::ConstantIndexOp>(loc, kMinNumElts);

auto vscale =

rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());

auto lowerBound = rewriter.create<arith::ConstantIndexOp>(loc, 0);

auto upperBound = rewriter.create<arith::MulIOp>(loc, minElems, vscale);

auto forOp = rewriter.create<scf::ForOp>(loc, lowerBound, upperBound, step);

rewriter.setInsertionPointToStart(forOp.getBody());

// Create 'arm_sme.intr.str' intrinsic to store ZA vector.

auto vnumI64 = rewriter.create<arith::IndexCastUIOp>(

loc, rewriter.getI64Type(), forOp.getInductionVar());

auto offset =

rewriter.create<LLVM::ConstantOp>(loc, rewriter.getI64Type(), 0);

Value ptr = getStridedElementPtr(loc, memRefType, adaptor.getSource(),

ValueRange{vnumI64, offset}, rewriter);

auto vnumI32 = rewriter.create<arith::IndexCastUIOp>(

loc, rewriter.getI32Type(), forOp.getInductionVar());

rewriter.create<arm_sme::aarch64_sme_str>(loc, vnumI32, ptr);

rewriter.eraseOp(write);

return success();

}

};

} // namespace

void mlir::arm_sme::populateVectorTransferLoweringPatterns(

LLVMTypeConverter &converter, RewritePatternSet &patterns) {

patterns.add<TransferWriteToArmSMEZeroLowering>(converter);

}

mlir/test/Dialect/ArmSME/vector-ops.mlir

This file was added.

				// RUN: mlir-opt %s -convert-vector-to-llvm="enable-arm-sme" -split-input-file \| mlir-opt \| FileCheck %s

				// CHECK-LABEL: @transfer_write_2d_zero_i8
				// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)
				// CHECK-NEXT: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
				// CHECK: %[[C255:.*]] = arith.constant 255 : i32
				// CHECK-NEXT: "arm_sme.intr.zero"(%[[C255]]) : (i32) -> ()
				// CHECK-NEXT: %[[C1:.*]] = arith.constant 1 : index
				// CHECK-NEXT: %[[MIN_ZA_VECTORS:.*]] = arith.constant 16 : index
				// CHECK-NEXT: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64
				// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index
				// CHECK-NEXT: %[[C0_0:.*]] = arith.constant 0 : index
				// CHECK-NEXT: %[[NUM_ZA_VECTORS:.*]] = arith.muli %[[MIN_ZA_VECTORS]], %[[VSCALE_IDX]] : index
				// CHECK-NEXT: scf.for %[[VNUM:.*]] = %[[C0_0]] to %[[NUM_ZA_VECTORS]] step %[[C1]] {
				// CHECK-NEXT: %[[VNUM_I64:.*]] = arith.index_castui %[[VNUM]] : index to i64
				// CHECK-NEXT: %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
				// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
				// CHECK-NEXT: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
				// CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[VNUM_I64]], %[[STRIDE0]] : i64
				// CHECK-NEXT: %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C0_1]] : i64
				// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
				// CHECK-NEXT: %[[VNUM_I32:.*]] = arith.index_castui %[[VNUM]] : index to i32
				// CHECK-NEXT: "arm_sme.intr.str"(%[[VNUM_I32]], %[[GEP]]) : (i32, !llvm.ptr) -> ()
				func.func @transfer_write_2d_zero_i8(%arg0 : memref<?x?xi8>) {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
				vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
				return
				}

				// -----

				// The following tests check the 'vector.transfer_write' -> 'arm_sme.intr.zero'
				// lowering only occurs for vector types of correct rank, shape, element size
				// and number of scalable dims.

				// CHECK-LABEL: @transfer_write_2d_zero__bad_type
				// CHECK: vector.transfer_write
				// CHECK-NOT: arm_sme.intr.zero
				func.func @transfer_write_2d_zero__bad_type(%arg0 : memref<?x?xi4>) {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<0> : vector<[16]x[16]xi4>
				vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi4>, memref<?x?xi4>
				return
				}

				// -----

				// CHECK-LABEL: @transfer_write_2d_zero__bad_shape
				// CHECK: vector.transfer_write
				// CHECK-NOT: arm_sme.intr.zero
				func.func @transfer_write_2d_zero__bad_shape(%arg0 : memref<?x?xi8>) {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<0> : vector<[8]x[8]xi8>
				vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[8]x[8]xi8>, memref<?x?xi8>
				return
				}

				// -----

				// CHECK-LABEL: @transfer_write_2d_zero__bad_rank
				// CHECK: vector.transfer_write
				// CHECK-NOT: arm_sme.intr.zero
				func.func @transfer_write_2d_zero__bad_rank(%arg0 : memref<?x?x?xi8>) {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<0> : vector<[16]x[16]x[16]xi8>
				vector.transfer_write %cst, %arg0[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<[16]x[16]x[16]xi8>, memref<?x?x?xi8>
				return
				}

				// -----

				// CHECK-LABEL: @transfer_write_2d_zero__non_memref_type
				// CHECK: vector.transfer_write
				// CHECK-NOT: arm_sme.intr.zero
				func.func @transfer_write_2d_zero__non_memref_type(%arg0 : tensor<?x?xi8>) -> tensor<?x?xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
				%0 = vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, tensor<?x?xi8>
				return %0 : tensor<?x?xi8>
				}

				// -----

				// CHECK-LABEL: @transfer_write_2d_zero__non_zero_value
				// CHECK: vector.transfer_write
				// CHECK-NOT: arm_sme.intr.zero
				func.func @transfer_write_2d_zero__non_zero_value(%arg0 : memref<?x?xi8>) {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<1> : vector<[16]x[16]xi8>
				vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
				return
				}

				// -----

				// CHECK-LABEL: @transfer_write_2d_zero__vec_unknown_defining_op
				// CHECK: vector.transfer_write
				// CHECK-NOT: arm_sme.intr.zero
				func.func @transfer_write_2d_zero__vec_unknown_defining_op(%arg0 : memref<?x?xi8>, %arg1 : vector<[16]x[16]xi8>) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %arg1, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
				return
				}

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir

This file was added.

				// RUN: mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \
				// RUN: -convert-vector-to-llvm="enable-arm-sme" -test-lower-to-llvm \| \
				// RUN: mlir-translate -mlir-to-llvmir \| \
				// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \
				// RUN: --entry-function=entry \
				// RUN: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \| \
				// RUN: FileCheck %s

				func.func @entry() -> i32 {
				%c0 = arith.constant 0 : index
				%c1_i8 = arith.constant 1 : i8
				%c1_index = arith.constant 1 : index

				%c16 = arith.constant 16 : index
				%vscale = vector.vscale

				// "svl" refers to the Streaming Vector Length and "svl_b" the number of
				// 8-bit elements in a vector of SVL bits.
				%svl_b = arith.muli %c16, %vscale : index

				// Allocate memory and fill with ones.
				//
				// TODO: type conversion of rank > 1 vector types generates array(s) of
				// vectors. This is invalid for scalable vectors since LLVM doesn't support
				// arrays of scalable vectors. This prevents initializing 2-d vectors with
				// 'vector.store' or 'vector.transfer_write' ops until this is resolved or
				// there's a custom lowering path.
				%za_b = memref.alloca(%svl_b, %svl_b) : memref<?x?xi8>
				scf.for %i = %c0 to %svl_b step %c1_index {
				scf.for %j = %c0 to %svl_b step %c1_index {
				memref.store %c1_i8, %za_b[%i, %j] : memref<?x?xi8>
				}
				}

				// Verify memory is ones by doing a mul reduction with initial value of one.
				%init_1 = arith.constant 1 : i64
				%mul_reduce = scf.for %vnum = %c0 to %svl_b step %c1_index iter_args(%iter = %init_1) -> (i64) {
				%row = vector.load %za_b[%vnum, %c0] : memref<?x?xi8>, vector<[16]xi8>

				%inner_mul_reduce = scf.for %offset = %c0 to %svl_b step %c1_index iter_args(%inner_iter = %init_1) -> (i64) {
				%t = vector.extractelement %row[%offset : index] : vector<[16]xi8>
				%t_i64 = arith.extui %t : i8 to i64
				%inner_mul_reduce_next = arith.muli %inner_iter, %t_i64 : i64
				scf.yield %inner_mul_reduce_next : i64
				}

				%mul_reduce_next = arith.muli %iter, %inner_mul_reduce : i64
				scf.yield %mul_reduce_next : i64
				}

				// CHECK: 1
				vector.print %mul_reduce : i64

				// Verify the mul reduction works as expected.
				//
				// TODO: ZA currently isn't re-enabled after calls and is therefore disable
				// by the callee on return. Once this is resolved this can be moved to a
				// function.
				%c3 = arith.constant 3 : index
				%c4 = arith.constant 4 : i8
				%c7 = arith.constant 7 : index
				%c15 = arith.constant 15 : i8
				memref.store %c4, %za_b[%c3, %c7] : memref<?x?xi8>
				memref.store %c15, %za_b[%c7, %c3] : memref<?x?xi8>
				%mul_reduce2 = scf.for %vnum = %c0 to %svl_b step %c1_index iter_args(%iter = %init_1) -> (i64) {
				%row = vector.load %za_b[%vnum, %c0] : memref<?x?xi8>, vector<[16]xi8>

				%inner_mul_reduce = scf.for %offset = %c0 to %svl_b step %c1_index iter_args(%inner_iter = %init_1) -> (i64) {
				%t = vector.extractelement %row[%offset : index] : vector<[16]xi8>
				%t_i64 = arith.extui %t : i8 to i64
				%inner_mul_reduce_next = arith.muli %inner_iter, %t_i64 : i64
				scf.yield %inner_mul_reduce_next : i64
				}

				%mul_reduce_next = arith.muli %iter, %inner_mul_reduce : i64
				scf.yield %mul_reduce_next : i64
				}

				// 15*4=60
				// CHECK: 60
				vector.print %mul_reduce2 : i64

				// Fill memory with zeroes.
				//
				// This will get lowered to:
				//
				// zero {za}
				// for vnum = 0; vnum < SVLb; ++vnum;
				// str za[vnum], [ptr]
				// ...
				//
				%cst_0 = arith.constant dense<0> : vector<[16]x[16]xi8>
				vector.transfer_write %cst_0, %za_b[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>

				// Verify memory is zeroed by doing an add reduction with initial value of
				// zero.
				%init_0 = arith.constant 0 : i64
				%add_reduce = scf.for %vnum = %c0 to %svl_b step %c1_index iter_args(%iter = %init_0) -> (i64) {
				%row = vector.load %za_b[%vnum, %c0] : memref<?x?xi8>, vector<[16]xi8>

				%inner_add_reduce = scf.for %offset = %c0 to %svl_b step %c1_index iter_args(%inner_iter = %init_0) -> (i64) {
				%t = vector.extractelement %row[%offset : index] : vector<[16]xi8>
				%t_i64 = arith.extui %t : i8 to i64
				%inner_add_reduce_next = arith.addi %inner_iter, %t_i64 : i64
				scf.yield %inner_add_reduce_next : i64
				}

				%add_reduce_next = arith.addi %iter, %inner_add_reduce : i64
				scf.yield %add_reduce_next : i64
				}

				// CHECK-NEXT: 0
				vector.print %add_reduce : i64

				// Verify the add reduction works as expected.
				//
				// TODO: ZA currently isn't re-enabled after calls and is therefore disable
				// by the callee on return. Once this is resolved this can be moved to a
				// function.
				memref.store %c4, %za_b[%c3, %c7] : memref<?x?xi8>
				memref.store %c15, %za_b[%c7, %c3] : memref<?x?xi8>
				%add_reduce2 = scf.for %vnum = %c0 to %svl_b step %c1_index iter_args(%iter = %init_0) -> (i64) {
				%row = vector.load %za_b[%vnum, %c0] : memref<?x?xi8>, vector<[16]xi8>

				%inner_add_reduce = scf.for %offset = %c0 to %svl_b step %c1_index iter_args(%inner_iter = %init_0) -> (i64) {
				%t = vector.extractelement %row[%offset : index] : vector<[16]xi8>
				%t_i64 = arith.extui %t : i8 to i64
				%inner_add_reduce_next = arith.addi %inner_iter, %t_i64 : i64
				scf.yield %inner_add_reduce_next : i64
				}

				%add_reduce_next = arith.addi %iter, %inner_add_reduce : i64
				scf.yield %add_reduce_next : i64
				}

				// 15+4=19
				// CHECK-NEXT: 19
				vector.print %add_reduce2 : i64

				%c0_i32 = arith.constant 0 : i32
				return %c0_i32 : i32
				}

mlir/test/Target/LLVMIR/arm-sme.mlir

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	llvm.func @arm_sme_store(%nxv1i1 : vector<[1]xi1>,
"arm_sme.intr.st1w.vert"(%nxv4i1, %p32, %c0, %c0) :		"arm_sme.intr.st1w.vert"(%nxv4i1, %p32, %c0, %c0) :
(vector<[4]xi1>, !llvm.ptr<i32>, i32, i32) -> ()		(vector<[4]xi1>, !llvm.ptr<i32>, i32, i32) -> ()
// CHECK: call void @llvm.aarch64.sme.st1h.vert		// CHECK: call void @llvm.aarch64.sme.st1h.vert
"arm_sme.intr.st1h.vert"(%nxv8i1, %p16, %c0, %c0) :		"arm_sme.intr.st1h.vert"(%nxv8i1, %p16, %c0, %c0) :
(vector<[8]xi1>, !llvm.ptr<i16>, i32, i32) -> ()		(vector<[8]xi1>, !llvm.ptr<i16>, i32, i32) -> ()
// CHECK: call void @llvm.aarch64.sme.st1b.vert		// CHECK: call void @llvm.aarch64.sme.st1b.vert
"arm_sme.intr.st1b.vert"(%nxv16i1, %p8, %c0, %c0) :		"arm_sme.intr.st1b.vert"(%nxv16i1, %p8, %c0, %c0) :
(vector<[16]xi1>, !llvm.ptr<i8>, i32, i32) -> ()		(vector<[16]xi1>, !llvm.ptr<i8>, i32, i32) -> ()
		// CHECK: call void @llvm.aarch64.sme.str
		"arm_sme.intr.str"(%c0, %p8) : (i32, !llvm.ptr<i8>) -> ()
llvm.return		llvm.return
}		}

// -----		// -----

// CHECK-LABEL: @arm_sme_toggle_za		// CHECK-LABEL: @arm_sme_toggle_za
llvm.func @arm_sme_toggle_za() {		llvm.func @arm_sme_toggle_za() {
// CHECK: call void @llvm.aarch64.sme.za.enable()		// CHECK: call void @llvm.aarch64.sme.za.enable()
"arm_sme.intr.za.enable"() : () -> ()		"arm_sme.intr.za.enable"() : () -> ()
// CHECK: call void @llvm.aarch64.sme.za.disable()		// CHECK: call void @llvm.aarch64.sme.za.disable()
"arm_sme.intr.za.disable"() : () -> ()		"arm_sme.intr.za.disable"() : () -> ()
llvm.return		llvm.return
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ArmSME] Add basic lowering of vector.transfer write to zeroClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 536703

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.h

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td

mlir/include/mlir/Dialect/ArmSME/Transforms/Transforms.h

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp

mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt

mlir/lib/Dialect/ArmSME/Transforms/CMakeLists.txt

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp

mlir/test/Dialect/ArmSME/vector-ops.mlir

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir

mlir/test/Target/LLVMIR/arm-sme.mlir

[mlir][ArmSME] Add basic lowering of vector.transfer write to zero
ClosedPublic