This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/ArmSME/
-
mlir/
-
Dialect/
-
ArmSME/
-
IR/
-
ArmSME.h
2/2
ArmSME.td
-
Transforms/
4/4
Transforms.h
-
lib/
-
Conversion/VectorToLLVM/
-
VectorToLLVM/
-
ConvertVectorToLLVMPass.cpp
-
Dialect/ArmSME/
-
ArmSME/
-
IR/
-
CMakeLists.txt
-
Transforms/
-
CMakeLists.txt
-
LegalizeForLLVMExport.cpp
5/6
LowerVectorOps.cpp
-
test/
-
Dialect/ArmSME/
-
ArmSME/
4/4
vector_ops.mlir
-
Integration/Dialect/Vector/CPU/ArmSME/
-
Dialect/
-
Vector/
-
CPU/
-
ArmSME/
18/18
vector_ops.mlir
-
Target/LLVMIR/
-
LLVMIR/
-
arm-sme.mlir

Differential D152508

[mlir][ArmSME] Add basic lowering of vector.transfer write to zero
ClosedPublic

Authored by c-rhodes on Jun 9 2023, 2:33 AM.

Download Raw Diff

Details

Reviewers

awarzynski
dcaballe
WanderAway
aartbik
ftynse
nicolasvasilache

Commits

rG564713c47175: [mlir][ArmSME] Add basic lowering of vector.transfer_write to zero
rGa48fe898857c: [mlir][ArmSME] Add initial dialect with basic lowering of vector.transfer write…

Summary

This patch adds support for lowering a vector.transfer_write of zeroes
and type vector<[16x16]xi8> to the SME zero {za} instruction [1],
which zeroes the entire accumulator, and then writing it out to memory
with the str instruction [2].

This contributes to supporting a path from linalg.fill to SME.

[1] https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/ZERO--Zero-a-list-of-64-bit-element-ZA-tiles-
[2] https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/STR--Store-vector-from-ZA-array-

Diff Detail

Event Timeline

c-rhodes created this revision.Jun 9 2023, 2:33 AM

Herald added a reviewer: aartbik. · View Herald TranscriptJun 9 2023, 2:33 AM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, Moerafaat, zero9178 and 24 others. · View Herald Transcript

c-rhodes requested review of this revision.Jun 9 2023, 2:33 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJun 9 2023, 2:33 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B237701: Diff 529864.Jun 9 2023, 2:47 AM

c-rhodes mentioned this in D152080: [MLIR][ArmSME] Initial SME Dialect Implementation.Jun 9 2023, 3:01 AM

Matt added a subscriber: Matt.Jun 9 2023, 10:21 AM

Hi @c-rhodes , thanks for working on this :)

I've made a few suggestions inline. Mostly asking for more documentation and wondering whether this could be trimmed a bit more - there are some references to ArmSME Ops, but none are defined, so perhaps some code is not needed?

Sadly, it doesn't build for me ATM (tried ToT: 15a16ef8e06e):

/llvm-project/mlir/include/mlir/Dialect/ArmSME/IR/ArmSMEDialect.h:21:10: fatal error: 'mlir/Dialect/ArmSME/ArmSMEDialect.h.inc' file not found
#include "mlir/Dialect/ArmSME/ArmSMEDialect.h.inc"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSMEDialect.h
21 ↗	(On Diff #529864)	Is this needed?
24 ↗	(On Diff #529864)	If there are no SME Ops then why would this header be needed?
mlir/include/mlir/Dialect/ArmSME/Transforms/Transforms.h
24	There are no ArmSME ops :) There are no patterns :) Do we need this hook?
27	[nit] There are no ArmSME ops :)
mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp
9	Could you add a Doxygen note to document what is this file intended for?
19	We are missing some documentation :) Also, it would be good to document that only the `i8` case is supported ATM: element size: `i8` number of tiles: 1 tile size: `[16x16]xi8` And that this will be extended shortly :)
36	Could you replace `255` with some constant? Otherwise it's a magic number and it's unclear what it means.

Fix build error and address comments.

In D152508#4412985, @awarzynski wrote:
Hi @c-rhodes , thanks for working on this :)

I've made a few suggestions inline. Mostly asking for more documentation and wondering whether this could be trimmed a bit more - there are some references to ArmSME Ops, but none are defined, so perhaps some code is not needed?

Sadly, it doesn't build for me ATM (tried ToT: 15a16ef8e06e):
/llvm-project/mlir/include/mlir/Dialect/ArmSME/IR/ArmSMEDialect.h:21:10: fatal error: 'mlir/Dialect/ArmSME/ArmSMEDialect.h.inc' file not found
#include "mlir/Dialect/ArmSME/ArmSMEDialect.h.inc"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

thanks for reviewing! Apologies for the build error, path was wrong for that .inc (and a few others), it's in the IR directory. Should build now.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSMEDialect.h
21 ↗	(On Diff #529864)	Is this needed? It is yeah that .inc file is generated by tablegen and declares the dialect.
24 ↗	(On Diff #529864)	If there are no SME Ops then why would this header be needed? there's no custom ops but the intrinsic definition `LLVM_aarch64_sme_zero` is still an op.
mlir/include/mlir/Dialect/ArmSME/Transforms/Transforms.h
24	There are no ArmSME ops :) There are no patterns :) Do we need this hook? there's no custom ops so you're right this isn't needed, removed it.
27	[nit] There are no ArmSME ops :) The `LLVM_aarch64_sme_zero` intrinsic definition is as op, it's marked legal in this function.

Harbormaster completed remote builds in B238138: Diff 530437.Jun 12 2023, 3:07 AM

c-rhodes mentioned this in D152695: [mlir][ArmSME] Extend streaming-mode pass to support enabling ZA.Jun 12 2023, 4:06 AM

LGTM % ongoing comments. Thanks!

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp
47	Just a direct and simple translation from the Vector dialect... That's great!
mlir/test/Dialect/ArmSME/vector_ops.mlir
18	You can add `--split-input-file` and add a line with `// -----` between each test for them to run independently and in parallel

This revision is now accepted and ready to land.Jun 12 2023, 11:38 PM

Added missing LLVMIR test mlir/test/Target/LLVMIR/arm-sme.mlir.
Use -split-input-file.

In D152508#4416400, @dcaballe wrote:

LGTM % ongoing comments. Thanks!

Thanks for reviewing!

mlir/test/Dialect/ArmSME/vector_ops.mlir
18	You can add `--split-input-file` and add a line with `// -----` between each test for them to run independently and in parallel Done, thank you!

Harbormaster completed remote builds in B238415: Diff 530806.Jun 13 2023, 2:03 AM

LGTM % ongoing comments. Thanks!

Same. LGTM, thanks Cullen!

Closed by commit rGa48fe898857c: [mlir][ArmSME] Add initial dialect with basic lowering of vector.transfer write… (authored by c-rhodes). · Explain WhyJun 14 2023, 1:47 AM

This revision was automatically updated to reflect the committed changes.

c-rhodes marked an inline comment as done.

c-rhodes added a commit: rGa48fe898857c: [mlir][ArmSME] Add initial dialect with basic lowering of vector.transfer write….

c-rhodes added a reverting change: rG1e41a29d739a: Revert "[mlir][ArmSME] Add initial dialect with basic lowering of vector..Jun 14 2023, 2:10 AM

Apologies I shouldn't have committed this until there's consensus at the ODM, reverted.

This revision is now accepted and ready to land.Jun 14 2023, 2:12 AM

In D152508#4420362, @c-rhodes wrote:

Apologies I shouldn't have committed this until there's consensus at the ODM, reverted.

Thank you and sorry for the confusion, I should've left a clearer message when approving this.

In general, we do have +1 from Arm (myself), +1 from Google (Diego), but we should also make sure that this works for Huawei (Frank). Or, at least, that this wouldn't block Frank from pursuing the approach taken in https://reviews.llvm.org/D152080 (if that's still the preference after the ODM). In the meantime, we could merge https://reviews.llvm.org/D152878 as that change will be required regardless of whether going via the Vector dialect or not.

Thanks again for all the effort working on this!

we should also make sure that this works for Huawei (Frank)

Thanks for the consideration. I don't think this is in direct conflict with what we want to do, but again, we can discuss further during the ODM.

Rebased now D152878 has landed.

Harbormaster completed remote builds in B239064: Diff 531666.Jun 15 2023, 3:38 AM

c-rhodes mentioned this in rGe947e760585c: [mlir][ArmSME] Extend streaming-mode pass to support enabling ZA.Jun 16 2023, 2:27 AM

Rebase again now D153050 has landed.

Harbormaster completed remote builds in B239387: Diff 532109.Jun 16 2023, 5:58 AM

In D152508#4421516, @WanderAway wrote:

we should also make sure that this works for Huawei (Frank)

Thanks for the consideration. I don't think this is in direct conflict with what we want to do, but again, we can discuss further during the ODM.

@WanderAway Could you confirm that there are no objections from your side after the ODM? Thanks!

Herald added subscribers: gysit, Dinistro. · View Herald TranscriptJun 22 2023, 11:09 AM

@WanderAway Could you confirm that there are no objections from your side after the ODM? Thanks!

Yup, no objections here.

This patch was pretty basic so I've made some improvements/fixes:

Write ZA out to memory after zero {za}.
Add integration test (runs on QEMU).
Check the vector.transfer_write value is a dense arith.constant of zeroes.
Simplify tests in mlir/test/Dialect/ArmSME/vector_ops.mlir to pass memref as argument rather than set it up in each function. Also added 3 more tests that check lowering doesn't happen for:
- non-memref types.
- non zero values.
- vector.transfer_write value op where defining value isn't visible, previously crashed if it was passed as argument.

I was a bit unsure whether to abandon this and post a new patch given this has already been approved and there's a few changes, happy to do that if people prefer.

Harbormaster completed remote builds in B241150: Diff 534515.Jun 26 2023, 6:35 AM

Thanks for the updates, I've left some comments inline.

I was a bit unsure whether to abandon this and post a new patch given this has already been approved and there's a few changes, happy to do that if people prefer.

IMO we can continue here. You are basically refining the initial design rather than proposing something completely new.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
123	Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well?
mlir/test/Dialect/ArmSME/vector_ops.mlir
24	Doesn't this store the same array vector on each iteration? IIUC, the only thing that's changing is the destination.
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir
12	[nit] `%i1` (variable) is easy to confuse with `i1` (type). I would use `c1` instead. And if you need different types, `c1_idx` and `c1_i32`.
18	[nit] It's worth elaborating what `svl` stands for in this test. And how do we know that it's going to be "streaming vector length" rather than "vector length"?
34	[nit] Did you mean `init_1` instead? Similar comment for addition below.
35	[nit] In this case the induction variable has a very specific meaning. Also, it can be confusing that there 16 bytes being loaded, but the induction variable is only increased by 1.
37
39	[nit] Same point as for the previous `scf.for`
41
51	It would be good to also verify that the result is != 1 when the elements in the matrix are different. Would it be possible to set one element to 123 and verify that the result is 123?
75	Shouldn't this be ... ?
83	It would be good to also verify that the result is != 0 when the elements in the matrix are different. Would it be possible to set one element to 321 and see what happens?

awarzynski added inline comments.Jun 28 2023, 3:56 AM

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir
22–25	I am fine with loops, but not sure about the comment 🤔 . `vector.store` - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-sve.mlir#L9-L20 `vector.transfer_write` - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Dialect/Linalg/vectorization-scalable.mlir#L27 Like I said, loops are fine (I really like the simplicity). I am just curious what exactly is missing :) But we can investigate that independently of this patch.

Rebase and address comments.

c-rhodes marked 16 inline comments as done.Jun 30 2023, 3:28 AM

c-rhodes added inline comments.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
123	Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well? Good spot, cheers.
mlir/test/Dialect/ArmSME/vector_ops.mlir
24	Doesn't this store the same array vector on each iteration? IIUC, the only thing that's changing is the destination. Doh! It does yeah good spot, fixed.
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir
18	... And how do we know that it's going to be "streaming vector length" rather than "vector length"? streaming-mode is enabled by the `-enable-arm-streaming` pass.
22–25	I am fine with loops, but not sure about the comment 🤔 . `vector.store` - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-sve.mlir#L9-L20 `vector.transfer_write` - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Dialect/Linalg/vectorization-scalable.mlir#L27 Like I said, loops are fine (I really like the simplicity). I am just curious what exactly is missing :) But we can investigate that independently of this patch. Apologies, I've updated the comment to reflect the actual problem.
34	[nit] Did you mean `init_1` instead? Similar comment for addition below. It was intended as first init / second init but I can see how that's confusing, updated to your suggestion.
39	[nit] Same point as for the previous `scf.for` not sure col is applicable here, changed it to (row) `offset`.
75	Shouldn't this be ... ? it should. Good spot.

Harbormaster completed remote builds in B242366: Diff 536170.Jun 30 2023, 3:39 AM

c-rhodes mentioned this in D154302: [mlir][nfc] Clarify the limitation on scalable vectors.Jul 3 2023, 1:53 AM

Thanks for addressing my comments! I've tested locally and can confirm that the integration test runs correctly 🎉. Great job, LGTM!

Just a few final nits that you can either ignore or address when merging.

Btw, most test files use hyphen "-" rather than "_" underscore: "vector_ops.mlir" --> "vector-ops.mlir"?

As this remains within the scope of the original submission, I think it's fine to merge without waiting for the other reviewers to confirm (I'm just being mindful that this change has evolved since being originally OK'ed).

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp
41	Just a nit. I feel that it would be good to make it clear, consistently, that these are "virtual" SME tiles.
52	Replace 16 with a constant (it's repeated a few times).
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir
37–38	IMHO, readability would be better without wrapping lines like this one. And 80-char limit is rarely observed in tests. This is a nit ;-)
83	A comment might make it easier for our future selves to figure out where the "magic" 60 comes from :) This is a nit.

Closed by commit rG564713c47175: [mlir][ArmSME] Add basic lowering of vector.transfer_write to zero (authored by c-rhodes). · Explain WhyJul 3 2023, 3:26 AM

This revision was automatically updated to reflect the committed changes.

c-rhodes marked 7 inline comments as done.

c-rhodes added a commit: rG564713c47175: [mlir][ArmSME] Add basic lowering of vector.transfer_write to zero.

In D152508#4467879, @awarzynski wrote:

Thanks for addressing my comments! I've tested locally and can confirm that the integration test runs correctly 🎉. Great job, LGTM!

Just a few final nits that you can either ignore or address when merging.

Btw, most test files use hyphen "-" rather than "_" underscore: "vector_ops.mlir" --> "vector-ops.mlir"?

As this remains within the scope of the original submission, I think it's fine to merge without waiting for the other reviewers to confirm (I'm just being mindful that this change has evolved since being originally OK'ed).

Thanks for reviewing again! Addressed all your comments before committing. Cheers.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

ArmSME/

IR/

ArmSME.h

1 line

ArmSME.td

6 lines

Transforms/

Transforms.h

5 lines

lib/

Conversion/

VectorToLLVM/

ConvertVectorToLLVMPass.cpp

1 line

Dialect/

ArmSME/

IR/

CMakeLists.txt

1 line

Transforms/

CMakeLists.txt

3 lines

LegalizeForLLVMExport.cpp

4 lines

LowerVectorOps.cpp

110 lines

test/

Dialect/

ArmSME/

vector_ops.mlir

104 lines

Integration/

Dialect/

Vector/

CPU/

ArmSME/

vector_ops.mlir

148 lines

Target/

LLVMIR/

arm-sme.mlir

2 lines

Diff 536170

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.h

	//===- ArmSMEDialect.h - MLIR Dialect for Arm SME ---------------- C++ --===//			//===- ArmSMEDialect.h - MLIR Dialect for Arm SME ---------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file declares the Target dialect for ArmSME in MLIR.			// This file declares the Target dialect for ArmSME in MLIR.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_DIALECT_ARMSME_IR_ARMSME_H			#ifndef MLIR_DIALECT_ARMSME_IR_ARMSME_H
	#define MLIR_DIALECT_ARMSME_IR_ARMSME_H			#define MLIR_DIALECT_ARMSME_IR_ARMSME_H

	#include "mlir/Bytecode/BytecodeOpInterface.h"			#include "mlir/Bytecode/BytecodeOpInterface.h"
				#include "mlir/Dialect/SCF/IR/SCF.h"
	#include "mlir/IR/BuiltinTypes.h"			#include "mlir/IR/BuiltinTypes.h"
	#include "mlir/IR/Dialect.h"			#include "mlir/IR/Dialect.h"
	#include "mlir/IR/OpDefinition.h"			#include "mlir/IR/OpDefinition.h"
	#include "mlir/Interfaces/SideEffectInterfaces.h"			#include "mlir/Interfaces/SideEffectInterfaces.h"

	#include "mlir/Dialect/ArmSME/IR/ArmSMEDialect.h.inc"			#include "mlir/Dialect/ArmSME/IR/ArmSMEDialect.h.inc"

	#define GET_OP_CLASSES			#define GET_OP_CLASSES
	#include "mlir/Dialect/ArmSME/IR/ArmSME.h.inc"			#include "mlir/Dialect/ArmSME/IR/ArmSME.h.inc"

	#endif // MLIR_DIALECT_ARMSME_IR_ARMSME_H			#endif // MLIR_DIALECT_ARMSME_IR_ARMSME_H

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td

Show All 27 Lines	def ArmSME_Dialect : Dialect {
let description = [{		let description = [{
This dialect contains the definitions necessary to target Arm SME		This dialect contains the definitions necessary to target Arm SME
scalable matrix operations.		scalable matrix operations.

Sources:		Sources:
https://developer.arm.com/documentation/ddi0616		https://developer.arm.com/documentation/ddi0616
https://developer.arm.com/documentation/ddi0602/2023-03/SME-Instructions		https://developer.arm.com/documentation/ddi0602/2023-03/SME-Instructions
}];		}];
		let dependentDialects = ["scf::SCFDialect"];
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ArmSME Intrinsic op definitions		// ArmSME Intrinsic op definitions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def MOPPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2], [I1]>;		def MOPPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2], [I1]>;
def MOPVector : ScalableVectorOfLengthAndType<[16, 8, 4, 2],		def MOPVector : ScalableVectorOfLengthAndType<[16, 8, 4, 2],
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
def LLVM_aarch64_sme_st1d_horiz : ArmSME_IntrStoreOp<"st1d.horiz">;		def LLVM_aarch64_sme_st1d_horiz : ArmSME_IntrStoreOp<"st1d.horiz">;
def LLVM_aarch64_sme_st1q_horiz : ArmSME_IntrStoreOp<"st1q.horiz">;		def LLVM_aarch64_sme_st1q_horiz : ArmSME_IntrStoreOp<"st1q.horiz">;
def LLVM_aarch64_sme_st1b_vert : ArmSME_IntrStoreOp<"st1b.vert">;		def LLVM_aarch64_sme_st1b_vert : ArmSME_IntrStoreOp<"st1b.vert">;
def LLVM_aarch64_sme_st1h_vert : ArmSME_IntrStoreOp<"st1h.vert">;		def LLVM_aarch64_sme_st1h_vert : ArmSME_IntrStoreOp<"st1h.vert">;
def LLVM_aarch64_sme_st1w_vert : ArmSME_IntrStoreOp<"st1w.vert">;		def LLVM_aarch64_sme_st1w_vert : ArmSME_IntrStoreOp<"st1w.vert">;
def LLVM_aarch64_sme_st1d_vert : ArmSME_IntrStoreOp<"st1d.vert">;		def LLVM_aarch64_sme_st1d_vert : ArmSME_IntrStoreOp<"st1d.vert">;
def LLVM_aarch64_sme_st1q_vert : ArmSME_IntrStoreOp<"st1q.vert">;		def LLVM_aarch64_sme_st1q_vert : ArmSME_IntrStoreOp<"st1q.vert">;

		def LLVM_aarch64_sme_str
		awarzynskiUnsubmitted Done Reply Inline Actions Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well? awarzynski: Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well?
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well? Good spot, cheers. c-rhodes: > Could you update "mlir/test/Target/LLVMIR/arm-sme.mlir" as well? Good spot, cheers.
		: ArmSME_IntrOp<"str">,
		Arguments<(ins Arg<I32, "Index">,
		Arg<LLVM_AnyPointer, "Store address", [MemWrite]>)>;

def LLVM_aarch64_sme_za_enable : ArmSME_IntrOp<"za.enable">;		def LLVM_aarch64_sme_za_enable : ArmSME_IntrOp<"za.enable">;
def LLVM_aarch64_sme_za_disable : ArmSME_IntrOp<"za.disable">;		def LLVM_aarch64_sme_za_disable : ArmSME_IntrOp<"za.disable">;

#endif // ARMSME_OPS		#endif // ARMSME_OPS

mlir/include/mlir/Dialect/ArmSME/Transforms/Transforms.h

	Show All 9 Lines
	#define MLIR_DIALECT_ARMSME_TRANSFORMS_H			#define MLIR_DIALECT_ARMSME_TRANSFORMS_H

	namespace mlir {			namespace mlir {

	class LLVMConversionTarget;			class LLVMConversionTarget;
	class LLVMTypeConverter;			class LLVMTypeConverter;
	class RewritePatternSet;			class RewritePatternSet;

				namespace arm_sme {
				void populateVectorTransferLoweringPatterns(LLVMTypeConverter &converter,
				RewritePatternSet &patterns);
				} // namespace arm_sme

	/// Collect a set of patterns to lower ArmSME ops to ops that map to LLVM			/// Collect a set of patterns to lower ArmSME ops to ops that map to LLVM
	/// intrinsics.			/// intrinsics.
				awarzynskiUnsubmitted Done Reply Inline Actions There are no ArmSME ops :) There are no patterns :) Do we need this hook? awarzynski: 1. There are no ArmSME ops :) 2. There are no patterns :) Do we need this hook?
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions There are no ArmSME ops :) There are no patterns :) Do we need this hook? there's no custom ops so you're right this isn't needed, removed it. c-rhodes: > 1. There are no ArmSME ops :) > 2. There are no patterns :) > > Do we need this hook?
	void populateArmSMELegalizeForLLVMExportPatterns(LLVMTypeConverter &converter,			void populateArmSMELegalizeForLLVMExportPatterns(LLVMTypeConverter &converter,
	RewritePatternSet &patterns);			RewritePatternSet &patterns);

				awarzynskiUnsubmitted Done Reply Inline Actions [nit] There are no ArmSME ops :) awarzynski: [nit] There are no ArmSME ops :)
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions [nit] There are no ArmSME ops :) The `LLVM_aarch64_sme_zero` intrinsic definition is as op, it's marked legal in this function. c-rhodes: > [nit] There are no ArmSME ops :) The `LLVM_aarch64_sme_zero` intrinsic definition is as op…
	/// Configure the target to support lowering ArmSME ops to ops that map to LLVM			/// Configure the target to support lowering ArmSME ops to ops that map to LLVM
	/// intrinsics.			/// intrinsics.
	void configureArmSMELegalizeForExportTarget(LLVMConversionTarget &target);			void configureArmSMELegalizeForExportTarget(LLVMConversionTarget &target);

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_ARMSME_TRANSFORMS_H			#endif // MLIR_DIALECT_ARMSME_TRANSFORMS_H

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	void LowerVectorToLLVMPass::runOnOperation() {
}		}
if (armSVE) {		if (armSVE) {
configureArmSVELegalizeForExportTarget(target);		configureArmSVELegalizeForExportTarget(target);
populateArmSVELegalizeForLLVMExportPatterns(converter, patterns);		populateArmSVELegalizeForLLVMExportPatterns(converter, patterns);
}		}
if (armSME) {		if (armSME) {
configureArmSMELegalizeForExportTarget(target);		configureArmSMELegalizeForExportTarget(target);
populateArmSMELegalizeForLLVMExportPatterns(converter, patterns);		populateArmSMELegalizeForLLVMExportPatterns(converter, patterns);
		arm_sme::populateVectorTransferLoweringPatterns(converter, patterns);
}		}
if (amx) {		if (amx) {
configureAMXLegalizeForExportTarget(target);		configureAMXLegalizeForExportTarget(target);
populateAMXLegalizeForLLVMExportPatterns(converter, patterns);		populateAMXLegalizeForLLVMExportPatterns(converter, patterns);
}		}
if (x86Vector) {		if (x86Vector) {
configureX86VectorLegalizeForExportTarget(target);		configureX86VectorLegalizeForExportTarget(target);
populateX86VectorLegalizeForLLVMExportPatterns(converter, patterns);		populateX86VectorLegalizeForLLVMExportPatterns(converter, patterns);
}		}

if (failed(		if (failed(
applyPartialConversion(getOperation(), target, std::move(patterns))))		applyPartialConversion(getOperation(), target, std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}

mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt

	add_mlir_dialect_library(MLIRArmSMEDialect			add_mlir_dialect_library(MLIRArmSMEDialect
	ArmSME.cpp			ArmSME.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME

	DEPENDS			DEPENDS
	MLIRArmSMEIncGen			MLIRArmSMEIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRIR			MLIRIR
	MLIRLLVMDialect			MLIRLLVMDialect
				MLIRSCFDialect
	MLIRSideEffectInterfaces			MLIRSideEffectInterfaces
	)			)

mlir/lib/Dialect/ArmSME/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRArmSMETransforms			add_mlir_dialect_library(MLIRArmSMETransforms
	EnableArmStreaming.cpp			EnableArmStreaming.cpp
	LegalizeForLLVMExport.cpp			LegalizeForLLVMExport.cpp
				LowerVectorOps.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME/Transforms			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/ArmSME/Transforms

	DEPENDS			DEPENDS
	MLIRArmSMETransformsIncGen			MLIRArmSMETransformsIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRArmSMEDialect			MLIRArmSMEDialect
	MLIRFuncDialect			MLIRFuncDialect
	MLIRLLVMCommonConversion			MLIRLLVMCommonConversion
				MLIRVectorDialect
				MLIRSCFDialect
	MLIRPass			MLIRPass
	)			)

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp

	//===- LegalizeForLLVMExport.cpp - Prepare ArmSME for LLVM translation ----===//			//===- LegalizeForLLVMExport.cpp - Prepare ArmSME for LLVM translation ----===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"			#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
	#include "mlir/Dialect/ArmSME/IR/ArmSME.h"			#include "mlir/Dialect/ArmSME/IR/ArmSME.h"
	#include "mlir/Dialect/ArmSME/Transforms/Transforms.h"			#include "mlir/Dialect/ArmSME/Transforms/Transforms.h"
	#include "mlir/Dialect/Func/IR/FuncOps.h"			#include "mlir/Dialect/Func/IR/FuncOps.h"
				#include "mlir/Dialect/SCF/IR/SCF.h"

	using namespace mlir;			using namespace mlir;
	using namespace mlir::arm_sme;			using namespace mlir::arm_sme;

	namespace {			namespace {
	/// Insert 'llvm.aarch64.sme.za.enable' intrinsic at the start of 'func.func'			/// Insert 'llvm.aarch64.sme.za.enable' intrinsic at the start of 'func.func'
	/// ops to enable the ZA storage array.			/// ops to enable the ZA storage array.
	struct EnableZAPattern : public OpRewritePattern<func::FuncOp> {			struct EnableZAPattern : public OpRewritePattern<func::FuncOp> {
	Show All 25 Lines

	void mlir::populateArmSMELegalizeForLLVMExportPatterns(			void mlir::populateArmSMELegalizeForLLVMExportPatterns(
	LLVMTypeConverter &converter, RewritePatternSet &patterns) {			LLVMTypeConverter &converter, RewritePatternSet &patterns) {
	patterns.add<EnableZAPattern, DisableZAPattern>(patterns.getContext());			patterns.add<EnableZAPattern, DisableZAPattern>(patterns.getContext());
	}			}

	void mlir::configureArmSMELegalizeForExportTarget(			void mlir::configureArmSMELegalizeForExportTarget(
	LLVMConversionTarget &target) {			LLVMConversionTarget &target) {
	target.addLegalOp<arm_sme::aarch64_sme_za_enable,			target.addLegalOp<scf::ForOp, scf::YieldOp, arm_sme::aarch64_sme_zero,
				arm_sme::aarch64_sme_str, arm_sme::aarch64_sme_za_enable,
	arm_sme::aarch64_sme_za_disable>();			arm_sme::aarch64_sme_za_disable>();

	// Mark 'func.func' ops as legal if either:			// Mark 'func.func' ops as legal if either:
	// 1. no 'arm_za' function attribute is present.			// 1. no 'arm_za' function attribute is present.
	// 2. the 'arm_za' function attribute is present and the first op in the			// 2. the 'arm_za' function attribute is present and the first op in the
	// function is an 'arm_sme::aarch64_sme_za_enable' intrinsic.			// function is an 'arm_sme::aarch64_sme_za_enable' intrinsic.
	target.addDynamicallyLegalOp<func::FuncOp>([&](func::FuncOp funcOp) {			target.addDynamicallyLegalOp<func::FuncOp>([&](func::FuncOp funcOp) {
	if (funcOp.isDeclaration())			if (funcOp.isDeclaration())
	Show All 18 Lines

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp

This file was added.

//===- LowerVectorOps.cpp - Lower vector ops to SME -----------------------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// This file implements rewrite patterns to lower vector dialect ops to ArmSME.

awarzynskiUnsubmitted

Done

Could you add a Doxygen note to document what is this file intended for?

awarzynski: Could you add a Doxygen note to document what is this file intended for?

//===----------------------------------------------------------------------===//

#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"

#include "mlir/Conversion/LLVMCommon/Pattern.h"

#include "mlir/Dialect/Arith/IR/Arith.h"

#include "mlir/Dialect/ArmSME/IR/ArmSME.h"

#include "mlir/Dialect/ArmSME/Transforms/Transforms.h"

#include "mlir/Dialect/LLVMIR/LLVMDialect.h"

#include "mlir/Dialect/SCF/IR/SCF.h"

awarzynskiUnsubmitted

Done

We are missing some documentation :)

Also, it would be good to document that only the i8 case is supported ATM:

element size: i8
number of tiles: 1
tile size: [16x16]xi8

And that this will be extended shortly :)

awarzynski: We are missing some documentation :) Also, it would be good to document that only the `i8`…

#include "mlir/Dialect/Vector/IR/VectorOps.h"

#include "mlir/IR/BuiltinOps.h"

#include "mlir/IR/PatternMatch.h"

using namespace mlir;

using namespace mlir::arm_sme;

static constexpr unsigned kZeroZAMask = 255;

/// Returns true if 'val' is a splat of zero, false otherwise.

static bool isSplatZero(Type elemType, DenseElementsAttr val) {

if (llvm::isa<FloatType>(elemType))

return val && val.isSplat() && val.getSplatValue<APFloat>().isZero();

if (llvm::isa<IntegerType>(elemType))

return val && val.isSplat() && val.getSplatValue<APInt>().isZero();

return false;

}

awarzynskiUnsubmitted

Done

Could you replace 255 with some constant? Otherwise it's a magic number and it's unclear what it means.

awarzynski: Could you replace `255` with some constant? Otherwise it's a magic number and it's unclear what…

namespace {

/// Lower 'vector.transfer_write' op to 'arm_sme.intr.zero' op. Currently only

/// supports 2d scalable vector type 'vector<[16x16]xi8>' that maps to the ZA0.B

/// SME tile. This will be extended to support more element types.

awarzynskiUnsubmitted

Done

/// supports 2d scalable vector type 'vector<[16x16]xi8>' that maps to the ZA0.B

- /// SME tile. This will be extended to support more element types.

+ /// SME virtual tile. This will be extended to support more element types.

struct TransferWriteToArmSMEZeroLowering

Just a nit. I feel that it would be good to make it clear, consistently, that these are "virtual" SME tiles.

awarzynski: Just a nit. I feel that it would be good to make it clear, consistently, that these are…

struct TransferWriteToArmSMEZeroLowering

: public ConvertOpToLLVMPattern<vector::TransferWriteOp> {

using ConvertOpToLLVMPattern<vector::TransferWriteOp>::ConvertOpToLLVMPattern;

LogicalResult

matchAndRewrite(vector::TransferWriteOp write, OpAdaptor adaptor,

dcaballeUnsubmitted

Not Done

Just a direct and simple translation from the Vector dialect... That's great!

dcaballe: Just a direct and simple translation from the Vector dialect... That's great!

ConversionPatternRewriter &rewriter) const override {

auto vType = write.getVectorType();

if (vType.getRank() != 2)

return failure();

if (vType.getShape() != ArrayRef<int64_t>({16, 16}))

awarzynskiUnsubmitted

Done

Replace 16 with a constant (it's repeated a few times).

awarzynski: Replace 16 with a constant (it's repeated a few times).

return failure();

if (vType.getElementType() != rewriter.getI8Type())

return failure();

if (vType.getScalableDims().size() != 2)

return failure();

auto memRefType = llvm::dyn_cast<MemRefType>(write.getSource().getType());

if (!memRefType)

return failure();

auto constant = write.getVector().getDefiningOp<arith::ConstantOp>();

if (!constant)

return failure();

auto denseAttr = dyn_cast<DenseElementsAttr>(constant.getValueAttr());

if (!denseAttr || !isSplatZero(vType.getElementType(), denseAttr))

return failure();

auto loc = write.getLoc();

// Create 'arm_sme.intr.zero' intrinsic to zero ZA.

auto tile = rewriter.create<arith::ConstantOp>(

loc, rewriter.getI32Type(), rewriter.getI32IntegerAttr(kZeroZAMask));

rewriter.create<arm_sme::aarch64_sme_zero>(loc, tile);

// Create loop that iterates from 0 to SVLB-1 inclusive (the number of

// vectors in ZA) and stores each ZA vector to memory.

auto step = rewriter.create<arith::ConstantIndexOp>(loc, 1);

auto minElems = rewriter.create<arith::ConstantIndexOp>(loc, 16);

auto vscale =

rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());

auto lowerBound = rewriter.create<arith::ConstantIndexOp>(loc, 0);

auto upperBound = rewriter.create<arith::MulIOp>(loc, minElems, vscale);

auto forOp = rewriter.create<scf::ForOp>(loc, lowerBound, upperBound, step);

rewriter.setInsertionPointToStart(forOp.getBody());

// Create 'arm_sme.intr.str' intrinsic to store ZA vector.

auto vnumI64 = rewriter.create<arith::IndexCastUIOp>(

loc, rewriter.getI64Type(), forOp.getInductionVar());

auto offset =

rewriter.create<LLVM::ConstantOp>(loc, rewriter.getI64Type(), 0);

Value ptr = getStridedElementPtr(loc, memRefType, adaptor.getSource(),

ValueRange{vnumI64, offset}, rewriter);

auto vnumI32 = rewriter.create<arith::IndexCastUIOp>(

loc, rewriter.getI32Type(), forOp.getInductionVar());

rewriter.create<arm_sme::aarch64_sme_str>(loc, vnumI32, ptr);

rewriter.eraseOp(write);

return success();

}

};

} // namespace

void mlir::arm_sme::populateVectorTransferLoweringPatterns(

LLVMTypeConverter &converter, RewritePatternSet &patterns) {

patterns.add<TransferWriteToArmSMEZeroLowering>(converter);

}

mlir/test/Dialect/ArmSME/vector_ops.mlir

This file was added.

				// RUN: mlir-opt %s -convert-vector-to-llvm="enable-arm-sme" -split-input-file \| mlir-opt \| FileCheck %s

				// CHECK-LABEL: @transfer_write_2d_zero_i8
				// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)
				// CHECK-NEXT: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
				// CHECK: %[[C255:.*]] = arith.constant 255 : i32
				// CHECK-NEXT: "arm_sme.intr.zero"(%[[C255]]) : (i32) -> ()
				// CHECK-NEXT: %[[C1:.*]] = arith.constant 1 : index
				// CHECK-NEXT: %[[MIN_ZA_VECTORS:.*]] = arith.constant 16 : index
				// CHECK-NEXT: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64
				// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index
				// CHECK-NEXT: %[[C0_0:.*]] = arith.constant 0 : index
				// CHECK-NEXT: %[[NUM_ZA_VECTORS:.*]] = arith.muli %[[MIN_ZA_VECTORS]], %[[VSCALE_IDX]] : index
				// CHECK-NEXT: scf.for %[[VNUM:.*]] = %[[C0_0]] to %[[NUM_ZA_VECTORS]] step %[[C1]] {
				// CHECK-NEXT: %[[VNUM_I64:.*]] = arith.index_castui %[[VNUM]] : index to i64
				// CHECK-NEXT: %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
				// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
				// CHECK-NEXT: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
				dcaballeUnsubmitted Done Reply Inline Actions You can add `--split-input-file` and add a line with `// -----` between each test for them to run independently and in parallel dcaballe: You can add `--split-input-file` and add a line with `// -----` between each test for them to…
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions You can add `--split-input-file` and add a line with `// -----` between each test for them to run independently and in parallel Done, thank you! c-rhodes: > You can add `--split-input-file` and add a line with `// -----` between each test for them to…
				// CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[VNUM_I64]], %[[STRIDE0]] : i64
				// CHECK-NEXT: %[[OFF1:.*]] = llvm.add %[[OFF0]], %[[C0_1]] : i64
				// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF1]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
				// CHECK-NEXT: %[[VNUM_I32:.*]] = arith.index_castui %[[VNUM]] : index to i32
				// CHECK-NEXT: "arm_sme.intr.str"(%[[VNUM_I32]], %[[GEP]]) : (i32, !llvm.ptr) -> ()
				func.func @transfer_write_2d_zero_i8(%arg0 : memref<?x?xi8>) {
				awarzynskiUnsubmitted Done Reply Inline Actions Doesn't this store the same array vector on each iteration? IIUC, the only thing that's changing is the destination. awarzynski: Doesn't this store the same array vector on each iteration? IIUC, the only thing that's…
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions Doesn't this store the same array vector on each iteration? IIUC, the only thing that's changing is the destination. Doh! It does yeah good spot, fixed. c-rhodes: > Doesn't this store the same array vector on each iteration? IIUC, the only thing that's…
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
				vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
				return
				}

				// -----

				// The following tests check the 'vector.transfer_write' -> 'arm_sme.intr.zero'
				// lowering only occurs for vector types of correct rank, shape, element size
				// and number of scalable dims.

				// CHECK-LABEL: @transfer_write_2d_zero__bad_type
				// CHECK: vector.transfer_write
				// CHECK-NOT: arm_sme.intr.zero
				func.func @transfer_write_2d_zero__bad_type(%arg0 : memref<?x?xi4>) {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<0> : vector<[16]x[16]xi4>
				vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi4>, memref<?x?xi4>
				return
				}

				// -----

				// CHECK-LABEL: @transfer_write_2d_zero__bad_shape
				// CHECK: vector.transfer_write
				// CHECK-NOT: arm_sme.intr.zero
				func.func @transfer_write_2d_zero__bad_shape(%arg0 : memref<?x?xi8>) {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<0> : vector<[8]x[8]xi8>
				vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[8]x[8]xi8>, memref<?x?xi8>
				return
				}

				// -----

				// CHECK-LABEL: @transfer_write_2d_zero__bad_rank
				// CHECK: vector.transfer_write
				// CHECK-NOT: arm_sme.intr.zero
				func.func @transfer_write_2d_zero__bad_rank(%arg0 : memref<?x?x?xi8>) {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<0> : vector<[16]x[16]x[16]xi8>
				vector.transfer_write %cst, %arg0[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<[16]x[16]x[16]xi8>, memref<?x?x?xi8>
				return
				}

				// -----

				// CHECK-LABEL: @transfer_write_2d_zero__non_memref_type
				// CHECK: vector.transfer_write
				// CHECK-NOT: arm_sme.intr.zero
				func.func @transfer_write_2d_zero__non_memref_type(%arg0 : tensor<?x?xi8>) -> tensor<?x?xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
				%0 = vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, tensor<?x?xi8>
				return %0 : tensor<?x?xi8>
				}

				// -----

				// CHECK-LABEL: @transfer_write_2d_zero__non_zero_value
				// CHECK: vector.transfer_write
				// CHECK-NOT: arm_sme.intr.zero
				func.func @transfer_write_2d_zero__non_zero_value(%arg0 : memref<?x?xi8>) {
				%c0 = arith.constant 0 : index
				%cst = arith.constant dense<1> : vector<[16]x[16]xi8>
				vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
				return
				}

				// -----

				// CHECK-LABEL: @transfer_write_2d_zero__vec_unknown_defining_op
				// CHECK: vector.transfer_write
				// CHECK-NOT: arm_sme.intr.zero
				func.func @transfer_write_2d_zero__vec_unknown_defining_op(%arg0 : memref<?x?xi8>, %arg1 : vector<[16]x[16]xi8>) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %arg1, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
				return
				}

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir

This file was added.

// RUN: mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \

// RUN: -convert-vector-to-llvm="enable-arm-sme" -test-lower-to-llvm | \

// RUN: mlir-translate -mlir-to-llvmir | \

// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \

// RUN: --entry-function=entry \

// RUN: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext | \

// RUN: FileCheck %s

func.func @entry() -> i32 {

%c0 = arith.constant 0 : index

%c1_i8 = arith.constant 1 : i8

%c1_index = arith.constant 1 : index

awarzynskiUnsubmitted

Done

[nit] %i1 (variable) is easy to confuse with i1 (type). I would use c1 instead. And if you need different types, c1_idx and c1_i32.

awarzynski: [nit] `%i1` (variable) is easy to confuse with `i1` (type). I would use `c1` instead. And if…

%c16 = arith.constant 16 : index

%vscale = vector.vscale

// "svl" refers to the Streaming Vector Length and "svl_b" the number of

// 8-bit elements in a vector of SVL bits.

awarzynskiUnsubmitted

Done

[nit] It's worth elaborating what svl stands for in this test. And how do we know that it's going to be "streaming vector length" rather than "vector length"?

awarzynski: [nit] It's worth elaborating what `svl` stands for in this test. And how do we know that it's…

c-rhodesAuthorUnsubmitted

Done

... And how do we know that it's going to be "streaming vector length" rather than "vector length"?

streaming-mode is enabled by the -enable-arm-streaming pass.

c-rhodes: > ... And how do we know that it's going to be "streaming vector length" rather than "vector…

%svl_b = arith.muli %c16, %vscale : index

// Allocate memory and fill with ones.

// TODO: type conversion of rank > 1 vector types generates array(s) of

// vectors. This is invalid for scalable vectors since LLVM doesn't support

// arrays of scalable vectors. This prevents initializing 2-d vectors with

awarzynskiUnsubmitted

Done

I am fine with loops, but not sure about the comment 🤔 .

vector.store - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-sve.mlir#L9-L20

vector.transfer_write - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Dialect/Linalg/vectorization-scalable.mlir#L27

Like I said, loops are fine (I really like the simplicity). I am just curious what exactly is missing :) But we can investigate that independently of this patch.

awarzynski: I am fine with loops, but not sure about the comment 🤔 . `vector.store` - scalable - https…

c-rhodesAuthorUnsubmitted

Done

I am fine with loops, but not sure about the comment 🤔 .

vector.store - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-sve.mlir#L9-L20

vector.transfer_write - scalable - https://github.com/llvm/llvm-project/blob/79c83e12c8884fa46f2f2594836af93474f6ca5a/mlir/test/Dialect/Linalg/vectorization-scalable.mlir#L27

Like I said, loops are fine (I really like the simplicity). I am just curious what exactly is missing :) But we can investigate that independently of this patch.

Apologies, I've updated the comment to reflect the actual problem.

c-rhodes: > I am fine with loops, but not sure about the comment 🤔 . > > `vector.store` - scalable…

// 'vector.store' or 'vector.transfer_write' ops until this is resolved or

// there's a custom lowering path.

%za_b = memref.alloca(%svl_b, %svl_b) : memref<?x?xi8>

scf.for %i = %c0 to %svl_b step %c1_index {

scf.for %j = %c0 to %svl_b step %c1_index {

memref.store %c1_i8, %za_b[%i, %j] : memref<?x?xi8>

}

awarzynskiUnsubmitted

Done

// Verify memory is ones by doing a mul reduction with initial value of one.

- %init_0 = arith.constant 1 : i64

+ %init_1 = arith.constant 1 : i64

%mul_reduce = scf.for %iv = %c0 to %svl_b step %c1

[nit] Did you mean init_1 instead? Similar comment for addition below.

awarzynski: [nit] Did you mean `init_1` instead? Similar comment for addition below.

c-rhodesAuthorUnsubmitted

Done

[nit] Did you mean init_1 instead? Similar comment for addition below.

It was intended as first init / second init but I can see how that's confusing, updated to your suggestion.

c-rhodes: > [nit] Did you mean `init_1` instead? Similar comment for addition below. It was intended as…

// Verify memory is ones by doing a mul reduction with initial value of one.

awarzynskiUnsubmitted

Done

%init_0 = arith.constant 1 : i64

- %mul_reduce = scf.for %iv = %c0 to %svl_b step %c1

+ %mul_reduce = scf.for %row = %c0 to %svl_b step %c1

iter_args(%iter = %init_0) -> (i64) {

[nit] In this case the induction variable has a very specific meaning. Also, it can be confusing that there 16 bytes being loaded, but the induction variable is only increased by 1.

awarzynski: [nit] In this case the induction variable has a very specific meaning. Also, it can be…

%init_1 = arith.constant 1 : i64

%mul_reduce = scf.for %vnum = %c0 to %svl_b step %c1_index

awarzynskiUnsubmitted

Done

iter_args(%iter = %init_0) -> (i64) {

- %za_b_vec = vector.load %za_b[%iv, %c0] : memref<?x?xi8>, vector<16xi8>

+ %za_b_vec = vector.load %za_b[%iv, %c0] : memref<?x?xi8>, vector<[16]xi8>

%inner_mul_reduce = scf.for %iv2 = %c0 to %svl_b step %c1

awarzynski:

iter_args(%iter = %init_1) -> (i64) {

awarzynskiUnsubmitted

Done

%init_1 = arith.constant 1 : i64

- %mul_reduce = scf.for %vnum = %c0 to %svl_b step %c1_index

- iter_args(%iter = %init_1) -> (i64) {

+ %mul_reduce = scf.for %vnum = %c0 to %svl_b step %c1_index iter_args(%iter = %init_1) -> (i64) {

%row = vector.load %za_b[%vnum, %c0] : memref<?x?xi8>, vector<[16]xi8>

IMHO, readability would be better without wrapping lines like this one. And 80-char limit is rarely observed in tests. This is a nit ;-)

awarzynski: IMHO, readability would be better without wrapping lines like this one. And 80-char limit is…

%row = vector.load %za_b[%vnum, %c0] : memref<?x?xi8>, vector<[16]xi8>

awarzynskiUnsubmitted

Done

%za_b_vec = vector.load %za_b[%iv, %c0] : memref<?x?xi8>, vector<16xi8>

- %inner_mul_reduce = scf.for %iv2 = %c0 to %svl_b step %c1

+ %inner_mul_reduce = scf.for %col = %c0 to %svl_b step %c1

iter_args(%inner_iter = %init_0) -> (i64) {

[nit] Same point as for the previous scf.for

awarzynski: [nit] Same point as for the previous `scf.for`

c-rhodesAuthorUnsubmitted

Done

[nit] Same point as for the previous scf.for

not sure col is applicable here, changed it to (row) offset.

c-rhodes: > [nit] Same point as for the previous `scf.for` not sure col is applicable here, changed it…

%inner_mul_reduce = scf.for %offset = %c0 to %svl_b step %c1_index

awarzynskiUnsubmitted

Done

iter_args(%inner_iter = %init_0) -> (i64) {

- %t = vector.extractelement %za_b_vec[%iv2 : index] : vector<16xi8>

+ %t = vector.extractelement %za_b_vec[%iv2 : index] : vector<[16]xi8>

%t_i64 = arith.extui %t : i8 to i64

awarzynski:

iter_args(%inner_iter = %init_1) -> (i64) {

%t = vector.extractelement %row[%offset : index] : vector<[16]xi8>

%t_i64 = arith.extui %t : i8 to i64

%inner_mul_reduce_next = arith.muli %inner_iter, %t_i64 : i64

scf.yield %inner_mul_reduce_next : i64

}

%mul_reduce_next = arith.muli %iter, %inner_mul_reduce : i64

scf.yield %mul_reduce_next : i64

}

awarzynskiUnsubmitted

Done

It would be good to also verify that the result is != 1 when the elements in the matrix are different. Would it be possible to set one element to 123 and verify that the result is 123?

awarzynski: It would be good to also verify that the result is != 1 when the elements in the matrix are…

// CHECK: 1

vector.print %mul_reduce : i64

// Verify the mul reduction works as expected.

// TODO: ZA currently isn't re-enabled after calls and is therefore disable

// by the callee on return. Once this is resolved this can be moved to a

// function.

%c3 = arith.constant 3 : index

%c4 = arith.constant 4 : i8

%c7 = arith.constant 7 : index

%c15 = arith.constant 15 : i8

memref.store %c4, %za_b[%c3, %c7] : memref<?x?xi8>

memref.store %c15, %za_b[%c7, %c3] : memref<?x?xi8>

%mul_reduce2 = scf.for %vnum = %c0 to %svl_b step %c1_index

iter_args(%iter = %init_1) -> (i64) {

%row = vector.load %za_b[%vnum, %c0] : memref<?x?xi8>, vector<[16]xi8>

%inner_mul_reduce = scf.for %offset = %c0 to %svl_b step %c1_index

iter_args(%inner_iter = %init_1) -> (i64) {

%t = vector.extractelement %row[%offset : index] : vector<[16]xi8>

%t_i64 = arith.extui %t : i8 to i64

%inner_mul_reduce_next = arith.muli %inner_iter, %t_i64 : i64

awarzynskiUnsubmitted

Done

%t_i64 = arith.extui %t : i8 to i64

- %inner_add_reduce_next = arith.muli %inner_iter, %t_i64 : i64

+ %inner_add_reduce_next = arith.addi %inner_iter, %t_i64 : i64

scf.yield %inner_add_reduce_next : i64

Shouldn't this be ... ?

awarzynski: Shouldn't this be ... ?

c-rhodesAuthorUnsubmitted

Done

Shouldn't this be ... ?

it should. Good spot.

c-rhodes: > Shouldn't this be ... ? it should. Good spot.

scf.yield %inner_mul_reduce_next : i64

}

%mul_reduce_next = arith.muli %iter, %inner_mul_reduce : i64

scf.yield %mul_reduce_next : i64

}

// CHECK: 60

awarzynskiUnsubmitted

Done

It would be good to also verify that the result is != 0 when the elements in the matrix are different. Would it be possible to set one element to 321 and see what happens?

awarzynski: It would be good to also verify that the result is != 0 when the elements in the matrix are…

awarzynskiUnsubmitted

Done

scf.yield %mul_reduce_next : i64

}

- // CHECK: 60

+ // 15 * 4 = 60

+ // CHECK: 60

vector.print %mul_reduce2 : i64

A comment might make it easier for our future selves to figure out where the "magic" 60 comes from :) This is a nit.

awarzynski: A comment might make it easier for our future selves to figure out where the "magic" 60 comes…

vector.print %mul_reduce2 : i64

// Fill memory with zeroes.

// This will get lowered to:

// zero {za}

// for vnum = 0; vnum < SVLb; ++vnum;

// str za[vnum], [ptr]

// ...

%cst_0 = arith.constant dense<0> : vector<[16]x[16]xi8>

vector.transfer_write %cst_0, %za_b[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>

// Verify memory is zeroed by doing an add reduction with initial value of

// zero.

%init_0 = arith.constant 0 : i64

%add_reduce = scf.for %vnum = %c0 to %svl_b step %c1_index

iter_args(%iter = %init_0) -> (i64) {

%row = vector.load %za_b[%vnum, %c0] : memref<?x?xi8>, vector<[16]xi8>

%inner_add_reduce = scf.for %offset = %c0 to %svl_b step %c1_index

iter_args(%inner_iter = %init_0) -> (i64) {

%t = vector.extractelement %row[%offset : index] : vector<[16]xi8>

%t_i64 = arith.extui %t : i8 to i64

%inner_add_reduce_next = arith.addi %inner_iter, %t_i64 : i64

scf.yield %inner_add_reduce_next : i64

}

%add_reduce_next = arith.addi %iter, %inner_add_reduce : i64

scf.yield %add_reduce_next : i64

}

// CHECK-NEXT: 0

vector.print %add_reduce : i64

// Verify the add reduction works as expected.

// TODO: ZA currently isn't re-enabled after calls and is therefore disable

// by the callee on return. Once this is resolved this can be moved to a

// function.

memref.store %c4, %za_b[%c3, %c7] : memref<?x?xi8>

memref.store %c15, %za_b[%c7, %c3] : memref<?x?xi8>

%add_reduce2 = scf.for %vnum = %c0 to %svl_b step %c1_index

iter_args(%iter = %init_0) -> (i64) {

%row = vector.load %za_b[%vnum, %c0] : memref<?x?xi8>, vector<[16]xi8>

%inner_add_reduce = scf.for %offset = %c0 to %svl_b step %c1_index

iter_args(%inner_iter = %init_0) -> (i64) {

%t = vector.extractelement %row[%offset : index] : vector<[16]xi8>

%t_i64 = arith.extui %t : i8 to i64

%inner_add_reduce_next = arith.addi %inner_iter, %t_i64 : i64

scf.yield %inner_add_reduce_next : i64

}

%add_reduce_next = arith.addi %iter, %inner_add_reduce : i64

scf.yield %add_reduce_next : i64

}

// CHECK-NEXT: 19

vector.print %add_reduce2 : i64

%c0_i32 = arith.constant 0 : i32

return %c0_i32 : i32

}

mlir/test/Target/LLVMIR/arm-sme.mlir

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	llvm.func @arm_sme_store(%nxv1i1 : vector<[1]xi1>,
"arm_sme.intr.st1w.vert"(%nxv4i1, %p32, %c0, %c0) :		"arm_sme.intr.st1w.vert"(%nxv4i1, %p32, %c0, %c0) :
(vector<[4]xi1>, !llvm.ptr<i32>, i32, i32) -> ()		(vector<[4]xi1>, !llvm.ptr<i32>, i32, i32) -> ()
// CHECK: call void @llvm.aarch64.sme.st1h.vert		// CHECK: call void @llvm.aarch64.sme.st1h.vert
"arm_sme.intr.st1h.vert"(%nxv8i1, %p16, %c0, %c0) :		"arm_sme.intr.st1h.vert"(%nxv8i1, %p16, %c0, %c0) :
(vector<[8]xi1>, !llvm.ptr<i16>, i32, i32) -> ()		(vector<[8]xi1>, !llvm.ptr<i16>, i32, i32) -> ()
// CHECK: call void @llvm.aarch64.sme.st1b.vert		// CHECK: call void @llvm.aarch64.sme.st1b.vert
"arm_sme.intr.st1b.vert"(%nxv16i1, %p8, %c0, %c0) :		"arm_sme.intr.st1b.vert"(%nxv16i1, %p8, %c0, %c0) :
(vector<[16]xi1>, !llvm.ptr<i8>, i32, i32) -> ()		(vector<[16]xi1>, !llvm.ptr<i8>, i32, i32) -> ()
		// CHECK: call void @llvm.aarch64.sme.str
		"arm_sme.intr.str"(%c0, %p8) : (i32, !llvm.ptr<i8>) -> ()
llvm.return		llvm.return
}		}

// -----		// -----

// CHECK-LABEL: @arm_sme_toggle_za		// CHECK-LABEL: @arm_sme_toggle_za
llvm.func @arm_sme_toggle_za() {		llvm.func @arm_sme_toggle_za() {
// CHECK: call void @llvm.aarch64.sme.za.enable()		// CHECK: call void @llvm.aarch64.sme.za.enable()
"arm_sme.intr.za.enable"() : () -> ()		"arm_sme.intr.za.enable"() : () -> ()
// CHECK: call void @llvm.aarch64.sme.za.disable()		// CHECK: call void @llvm.aarch64.sme.za.disable()
"arm_sme.intr.za.disable"() : () -> ()		"arm_sme.intr.za.disable"() : () -> ()
llvm.return		llvm.return
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ArmSME] Add basic lowering of vector.transfer write to zeroClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 536170

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.h

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td

mlir/include/mlir/Dialect/ArmSME/Transforms/Transforms.h

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp

mlir/lib/Dialect/ArmSME/IR/CMakeLists.txt

mlir/lib/Dialect/ArmSME/Transforms/CMakeLists.txt

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp

mlir/lib/Dialect/ArmSME/Transforms/LowerVectorOps.cpp

mlir/test/Dialect/ArmSME/vector_ops.mlir

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector_ops.mlir

mlir/test/Target/LLVMIR/arm-sme.mlir

[mlir][ArmSME] Add basic lowering of vector.transfer write to zero
ClosedPublic