This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Conversion/
-
ArmSMEToSCF/
-
ArmSMEToSCF.h
-
Passes.h
-
Passes.td
-
Dialect/ArmSME/IR/
-
ArmSME/
-
IR/
10/10
ArmSME.td
-
lib/
-
Conversion/
-
ArmSMEToSCF/
7/8
ArmSMEToSCF.cpp
-
CMakeLists.txt
-
CMakeLists.txt
-
Dialect/ArmSME/Transforms/
-
ArmSME/
-
Transforms/
-
LegalizeForLLVMExport.cpp
-
test/
-
Conversion/ArmSMEToSCF/
-
ArmSMEToSCF/
-
arm-sme-to-scf.mlir
-
Dialect/ArmSME/
-
ArmSME/
-
arm-sme-to-llvm-casts.mlir
-
roundtrip.mlir
-
vector-ops-to-llvm.mlir
-
Integration/Dialect/Vector/CPU/ArmSME/
-
Dialect/
-
Vector/
-
CPU/
-
ArmSME/
-
vector-load-store.mlir
-
vector-ops.mlir

Differential D156467

[mlir][ArmSME] Add conversion from ArmSME to SCF to materialize loops
ClosedPublic

Authored by c-rhodes on Jul 27 2023, 11:02 AM.

Download Raw Diff

Details

Reviewers

awarzynski
dcaballe
WanderAway
benmxwl-arm
aartbik
ftynse
nicolasvasilache

Commits

rG9e1b82532145: [mlir][ArmSME] Add conversion from ArmSME to SCF to materialize loops

Summary

Currently a loop is materialized when lowering ArmSME loads and stores
to intrinsics. This patch introduces two new ops to the ArmSME dialect
that map 1-1 with intrinsics:

arm_sme.load_tile_slice - Loads a 1D tile slice from memory into a 2D SME "virtual tile".
arm_sme.store_tile_slice - Stores a 1D tile slice from a 2D SME "virtual tile" into memory.

As well as a new conversion pass '-convert-arm-sme-to-scf' that
materializes loops with these ops. The existing load/store lowering to
intrinsics is updated to use these ops.

Depends on D156517

Discourse thread:
https://discourse.llvm.org/t/loop-materialization-in-armsme/72354

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

c-rhodes created this revision.Jul 27 2023, 11:02 AM

Herald added a reviewer: aartbik. · View Herald TranscriptJul 27 2023, 11:02 AM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: gysit, Dinistro, bviyer and 26 others. · View Herald Transcript

c-rhodes requested review of this revision.Jul 27 2023, 11:02 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJul 27 2023, 11:02 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B248642: Diff 544860.Jul 27 2023, 11:03 AM

c-rhodes edited the summary of this revision. (Show Details)Jul 27 2023, 11:07 AM

c-rhodes mentioned this in D154867: [mlir][ArmSME] Introduce custom ops for SME.Jul 27 2023, 11:09 AM

Matt added a subscriber: Matt.Jul 27 2023, 11:12 AM

c-rhodes edited the summary of this revision. (Show Details)Jul 28 2023, 2:00 AM

c-rhodes added a parent revision: D156517: [mlir][ArmSME] Add -canonicalize to vector to ArmSME test.

WanderAway added inline comments.Jul 28 2023, 11:06 AM

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
311	I feel like the name is a bit confusing, although I'm not sure I can come up with something better... `LoadTileSlice`? `LoadVectorToTile`? Also I wonder if it makes sense for this to have a horizontal/vertical load variant/argument, so that it would be possible to do in-flight transpose as well?

LGTM! Just a few comments. Thanks for addressing this so quickly!

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
317–318	I don't follow here. The tile slice (1D) is defined by the 2D type? Do you mean by the dimension of the 2D type pointed by the index?
322	-> ~The slice of memory read is defined...?
329	This looks like a vector load with a passthru value so I would remove the `_and_update` part to make it shorter but up to you :)
mlir/lib/Conversion/ArmSMEToSCF/ArmSMEToSCF.cpp
85	Add a builder guard at the beginning of the function to restore the insertion point
149	same

This revision is now accepted and ready to land.Jul 28 2023, 11:13 AM

Address comments

c-rhodes marked 6 inline comments as done.Jul 31 2023, 6:49 AM

c-rhodes added inline comments.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
311	I feel like the name is a bit confusing, although I'm not sure I can come up with something better... `LoadTileSlice`? `LoadVectorToTile`? Also I wonder if it makes sense for this to have a horizontal/vertical load variant/argument, so that it would be possible to do in-flight transpose as well? It was rather verbose but until Diego pointed out the passthru on other vectors ops I wasn't aware of any ops that worked similarly, so wanted to be explicit about what this is doing. I've renamed it to 'load_tile_slice' as suggested, I think that reads better that 'tile_slice_load'. Also I wonder if it makes sense for this to have a horizontal/vertical load variant/argument, so that it would be possible to do in-flight transpose as well? I think at some point it probably will but I've given little consideration to that so far and have gone with the assumption that everything is horizontal. I personally dont want to add it prematurely without having an understanding of how it fits into the big picture. Have you given much thought to how it would be used?
317–318	I don't follow here. The tile slice (1D) is defined by the 2D type? Do you mean by the dimension of the 2D type pointed by the index? Apologies it's quite difficult writing generic descriptions for these ops. I think what I wanted to capture was e.g. for a tile of type `vector<[4]x[4]xi32>` the tile slice is `vector<[4]xi32>`, but I suppose the examples provide clarity there. I would like at some point to update the types in the asm to something like `vector<[4]xi32> from memref<?x?xi8> into vector<[4]x[4]xi32>`. Anyhow, I've updated the description, hopefully makes more sense.

c-rhodes added a child revision: D156689: [mlir][ArmSME] Use memref indices for load and store.Jul 31 2023, 6:52 AM

Harbormaster completed remote builds in B249197: Diff 545630.Jul 31 2023, 7:41 AM

LGTM, great work, thanks!

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
322–323	[nit] Perhaps this is just me and also English is not my first language, but this reads a bit like: The memref must be either "rank 1" or "rank 2 with dynamic dimensions" i.e. as if "dynamic dimensions" only referred to "rank 2".
355	Given that `type($tile)` will always be equal to `type($result)`, I wonder whether this wouldn't be cleaner: let assemblyFormat = [{ $base `[` $indices `]` `,` $tile `,` $tile_slice_index attr-dict `:` type($base) `,` type($tile) }]; or (with index type): let assemblyFormat = [{ $base `[` $indices `]` `,` $tile `,` $tile_slice_index attr-dict `:` type($base) `,` type($tile) `,` type($tile_slice_index) }]
mlir/lib/Conversion/ArmSMEToSCF/ArmSMEToSCF.cpp
72	[nit] "Init" in `tileInit` suggests that something might be initialised. Perhaps `tileIdAsVector`?
90	Can you expand?
102–103	Is this 2nd cast really needed? Wouldn't this work: rewriter.replaceOp(tileLoadOp, tileInit); ? I'm probably missing something obvious 🤔 .
mlir/test/Dialect/ArmSME/invalid.mlir
77–81 ↗	(On Diff #545630)	Similar test for `arm_sme.store_tile_slice`? If we preserve this assembly format.

Address comments

Have you given much thought to how it would be used?

Not too much at the current moment, only thing I have in mind so far is using it for transposition during matmul, since in order to do outer product, the A matrix slices would have to be a column vector, and B matrix slices a row vector. Also because gather load is not available during streaming mode.

c-rhodes marked 4 inline comments as done.Jul 31 2023, 10:14 AM

c-rhodes added inline comments.

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
322–323	[nit] Perhaps this is just me and also English is not my first language, but this reads a bit like: The memref must be either "rank 1" or "rank 2 with dynamic dimensions" i.e. as if "dynamic dimensions" only referred to "rank 2". Good point, I've clarified it.
355	Given that `type($tile)` will always be equal to `type($result)`, I wonder whether this wouldn't be cleaner: let assemblyFormat = [{ $base `[` $indices `]` `,` $tile `,` $tile_slice_index attr-dict `:` type($base) `,` type($tile) }]; it was originally like this but i changed the memref to have indices rather than a single index and `TypesMatchWith` no longer worked coming after `Variadic<Index>`, I've re-introduced this but moved `indices` after the `tile`to get it to work.
mlir/lib/Conversion/ArmSMEToSCF/ArmSMEToSCF.cpp
72	[nit] "Init" in `tileInit` suggests that something might be initialised. Perhaps `tileIdAsVector`? the idea here was to get an initial tile, I've renamed it to `tile`.
90	Can you expand? This is fixed in D156689
mlir/test/Dialect/ArmSME/invalid.mlir
77–81 ↗	(On Diff #545630)	Similar test for `arm_sme.store_tile_slice`? If we preserve this assembly format. Format has changed but this op doesn't have a result type anyway

Remove unnecessary cast_vector_to_tile op at the end of tile_load conversion.

c-rhodes marked an inline comment as done.Jul 31 2023, 10:40 AM

c-rhodes added inline comments.

mlir/lib/Conversion/ArmSMEToSCF/ArmSMEToSCF.cpp
102–103	Is this 2nd cast really needed? Wouldn't this work: rewriter.replaceOp(tileLoadOp, tileInit); ? I'm probably missing something obvious 🤔 . You're right this isn't necessary, memory semantics prevent these from being reordered. I've removed it.

Harbormaster completed remote builds in B249273: Diff 545737.Jul 31 2023, 12:52 PM

Thanks for addressing the feedback!

This revision was landed with ongoing or failed builds.Aug 1 2023, 1:20 AM

Closed by commit rG9e1b82532145: [mlir][ArmSME] Add conversion from ArmSME to SCF to materialize loops (authored by c-rhodes). · Explain Why

This revision was automatically updated to reflect the committed changes.

c-rhodes marked an inline comment as done.

c-rhodes added a commit: rG9e1b82532145: [mlir][ArmSME] Add conversion from ArmSME to SCF to materialize loops.

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

ArmSMEToSCF/

ArmSMEToSCF.h

29 lines

Passes.h

1 line

Passes.td

16 lines

Dialect/

ArmSME/

IR/

ArmSME.td

97 lines

lib/

Conversion/

ArmSMEToSCF/

ArmSMEToSCF.cpp

187 lines

CMakeLists.txt

14 lines

CMakeLists.txt

1 line

Dialect/

ArmSME/

Transforms/

LegalizeForLLVMExport.cpp

131 lines

test/

Conversion/

ArmSMEToSCF/

arm-sme-to-scf.mlir

36 lines

Dialect/

ArmSME/

arm-sme-to-llvm-casts.mlir

51 lines

roundtrip.mlir

162 lines

vector-ops-to-llvm.mlir

63 lines

Integration/

Dialect/

Vector/

CPU/

ArmSME/

vector-load-store.mlir

3 lines

vector-ops.mlir

3 lines

Diff 545958

mlir/include/mlir/Conversion/ArmSMEToSCF/ArmSMEToSCF.h

This file was added.

				//===- ArmSMEToSCF.h - Convert ArmSME to SCF dialect ------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_CONVERSION_ARMSMETOSCF_ARMSMETOSCF_H_
				#define MLIR_CONVERSION_ARMSMETOSCF_ARMSMETOSCF_H_

				#include <memory>

				namespace mlir {
				class Pass;
				class RewritePatternSet;

				#define GEN_PASS_DECL_CONVERTARMSMETOSCF
				#include "mlir/Conversion/Passes.h.inc"

				/// Collect a set of patterns to convert from the ArmSME dialect to SCF.
				void populateArmSMEToSCFConversionPatterns(RewritePatternSet &patterns);

				/// Create a pass to convert a subset of ArmSME ops to SCF.
				std::unique_ptr<Pass> createConvertArmSMEToSCFPass();

				} // namespace mlir

				#endif // MLIR_CONVERSION_ARMSMETOSCF_ARMSMETOSCF_H_

mlir/include/mlir/Conversion/Passes.h

	//===- Passes.h - Conversion Pass Construction and Registration -----------===//			//===- Passes.h - Conversion Pass Construction and Registration -----------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_CONVERSION_PASSES_H			#ifndef MLIR_CONVERSION_PASSES_H
	#define MLIR_CONVERSION_PASSES_H			#define MLIR_CONVERSION_PASSES_H

	#include "mlir/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.h"			#include "mlir/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.h"
	#include "mlir/Conversion/AffineToStandard/AffineToStandard.h"			#include "mlir/Conversion/AffineToStandard/AffineToStandard.h"
	#include "mlir/Conversion/ArithToLLVM/ArithToLLVM.h"			#include "mlir/Conversion/ArithToLLVM/ArithToLLVM.h"
	#include "mlir/Conversion/ArithToSPIRV/ArithToSPIRV.h"			#include "mlir/Conversion/ArithToSPIRV/ArithToSPIRV.h"
	#include "mlir/Conversion/ArmNeon2dToIntr/ArmNeon2dToIntr.h"			#include "mlir/Conversion/ArmNeon2dToIntr/ArmNeon2dToIntr.h"
				#include "mlir/Conversion/ArmSMEToSCF/ArmSMEToSCF.h"
	#include "mlir/Conversion/AsyncToLLVM/AsyncToLLVM.h"			#include "mlir/Conversion/AsyncToLLVM/AsyncToLLVM.h"
	#include "mlir/Conversion/BufferizationToMemRef/BufferizationToMemRef.h"			#include "mlir/Conversion/BufferizationToMemRef/BufferizationToMemRef.h"
	#include "mlir/Conversion/ComplexToLLVM/ComplexToLLVM.h"			#include "mlir/Conversion/ComplexToLLVM/ComplexToLLVM.h"
	#include "mlir/Conversion/ComplexToLibm/ComplexToLibm.h"			#include "mlir/Conversion/ComplexToLibm/ComplexToLibm.h"
	#include "mlir/Conversion/ComplexToSPIRV/ComplexToSPIRVPass.h"			#include "mlir/Conversion/ComplexToSPIRV/ComplexToSPIRVPass.h"
	#include "mlir/Conversion/ComplexToStandard/ComplexToStandard.h"			#include "mlir/Conversion/ComplexToStandard/ComplexToStandard.h"
	#include "mlir/Conversion/ControlFlowToLLVM/ControlFlowToLLVM.h"			#include "mlir/Conversion/ControlFlowToLLVM/ControlFlowToLLVM.h"
	#include "mlir/Conversion/ControlFlowToSPIRV/ControlFlowToSPIRV.h"			#include "mlir/Conversion/ControlFlowToSPIRV/ControlFlowToSPIRV.h"
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

mlir/include/mlir/Conversion/Passes.td

Show First 20 Lines • Show All 1,103 Lines • ▼ Show 20 Lines	def ConvertVectorToArmSME : Pass<"convert-vector-to-arm-sme"> {
let description = [{		let description = [{
Pass that converts vector dialect operations into equivalent ArmSME dialect		Pass that converts vector dialect operations into equivalent ArmSME dialect
operations.		operations.
}];		}];
let dependentDialects = ["arm_sme::ArmSMEDialect"];		let dependentDialects = ["arm_sme::ArmSMEDialect"];
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// ArmSMEToSCF
		//===----------------------------------------------------------------------===//

		def ConvertArmSMEToSCF : Pass<"convert-arm-sme-to-scf"> {
		let summary = "Lower the operations from the ArmSME dialect into the SCF "
		"dialect";
		let constructor = "mlir::createConvertArmSMEToSCFPass()";
		let dependentDialects = [
		"scf::SCFDialect",
		"arith::ArithDialect",
		"vector::VectorDialect",
		"arm_sme::ArmSMEDialect"
		];
		}

		//===----------------------------------------------------------------------===//
// VectorToLLVM		// VectorToLLVM
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def ConvertVectorToLLVMPass : Pass<"convert-vector-to-llvm"> {		def ConvertVectorToLLVMPass : Pass<"convert-vector-to-llvm"> {
let summary = "Lower the operations from the vector dialect into the LLVM "		let summary = "Lower the operations from the vector dialect into the LLVM "
"dialect";		"dialect";
let description = [{		let description = [{

▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td

Show All 10 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef ARMSME_OPS		#ifndef ARMSME_OPS
#define ARMSME_OPS		#define ARMSME_OPS

include "mlir/Interfaces/SideEffectInterfaces.td"		include "mlir/Interfaces/SideEffectInterfaces.td"
include "mlir/Dialect/LLVMIR/LLVMOpBase.td"		include "mlir/Dialect/LLVMIR/LLVMOpBase.td"
		include "mlir/Interfaces/InferTypeOpInterface.td"

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ArmSME dialect definition		// ArmSME dialect definition
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def ArmSME_Dialect : Dialect {		def ArmSME_Dialect : Dialect {
let name = "arm_sme";		let name = "arm_sme";
let cppNamespace = "::mlir::arm_sme";		let cppNamespace = "::mlir::arm_sme";
▲ Show 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	VectorType getVectorType() {
return ::llvm::cast<VectorType>(getValueToStore().getType());		return ::llvm::cast<VectorType>(getValueToStore().getType());
}		}
}];		}];

let assemblyFormat = "$valueToStore `,` $base `[` $indices `]` attr-dict "		let assemblyFormat = "$valueToStore `,` $base `[` $indices `]` attr-dict "
"`:` type($base) `,` type($valueToStore)";		"`:` type($base) `,` type($valueToStore)";
}		}

		def LoadTileSliceOp : ArmSME_Op<"load_tile_slice", [
		WanderAwayUnsubmitted Done Reply Inline Actions I feel like the name is a bit confusing, although I'm not sure I can come up with something better... `LoadTileSlice`? `LoadVectorToTile`? Also I wonder if it makes sense for this to have a horizontal/vertical load variant/argument, so that it would be possible to do in-flight transpose as well? WanderAway: I feel like the name is a bit confusing, although I'm not sure I can come up with something…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions I feel like the name is a bit confusing, although I'm not sure I can come up with something better... `LoadTileSlice`? `LoadVectorToTile`? Also I wonder if it makes sense for this to have a horizontal/vertical load variant/argument, so that it would be possible to do in-flight transpose as well? It was rather verbose but until Diego pointed out the passthru on other vectors ops I wasn't aware of any ops that worked similarly, so wanted to be explicit about what this is doing. I've renamed it to 'load_tile_slice' as suggested, I think that reads better that 'tile_slice_load'. Also I wonder if it makes sense for this to have a horizontal/vertical load variant/argument, so that it would be possible to do in-flight transpose as well? I think at some point it probably will but I've given little consideration to that so far and have gone with the assumption that everything is horizontal. I personally dont want to add it prematurely without having an understanding of how it fits into the big picture. Have you given much thought to how it would be used? c-rhodes: > I feel like the name is a bit confusing, although I'm not sure I can come up with something…
		AllTypesMatch<["tile", "result"]>
		]> {
		let summary = "Tile slice load and update operation";
		let description = [{
		Loads a 1D tile slice from memory into a 2D SME "virtual tile". The tile
		slice is defined by the dimension of the 2D scalable vector type pointed by
		the index. A tile slice index describes where in the input tile the tile
		dcaballeUnsubmitted Done Reply Inline Actions I don't follow here. The tile slice (1D) is defined by the 2D type? Do you mean by the dimension of the 2D type pointed by the index? dcaballe: I don't follow here. The tile slice (1D) is defined by the 2D type? Do you mean by the…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions I don't follow here. The tile slice (1D) is defined by the 2D type? Do you mean by the dimension of the 2D type pointed by the index? Apologies it's quite difficult writing generic descriptions for these ops. I think what I wanted to capture was e.g. for a tile of type `vector<[4]x[4]xi32>` the tile slice is `vector<[4]xi32>`, but I suppose the examples provide clarity there. I would like at some point to update the types in the asm to something like `vector<[4]xi32> from memref<?x?xi8> into vector<[4]x[4]xi32>`. Anyhow, I've updated the description, hopefully makes more sense. c-rhodes: > I don't follow here. The tile slice (1D) is defined by the 2D type? Do you mean by the…
		slice is loaded to. The updated tile is returned as the result.

		The slice of memory read is defined by a base and indices and must be
		contiguous. The memref must be either rank 1 or rank 2, have dynamic
		dcaballeUnsubmitted Done Reply Inline Actions -> ~The slice of memory read is defined...? dcaballe: -> ~The slice of memory read is defined...?
		dimensions since the operation is scalable, and the element type must be a
		awarzynskiUnsubmitted Done Reply Inline Actions [nit] Perhaps this is just me and also English is not my first language, but this reads a bit like: The memref must be either "rank 1" or "rank 2 with dynamic dimensions" i.e. as if "dynamic dimensions" only referred to "rank 2". awarzynski: [nit] Perhaps this is just me and also English is not my first language, but this reads a bit…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions [nit] Perhaps this is just me and also English is not my first language, but this reads a bit like: The memref must be either "rank 1" or "rank 2 with dynamic dimensions" i.e. as if "dynamic dimensions" only referred to "rank 2". Good point, I've clarified it. c-rhodes: > [nit] Perhaps this is just me and also English is not my first language, but this reads a bit…
		scalar that matches the element type of the result.

		Example 1: Load a vector<[16]xi8> tile slice from memory into tile at given index.
		```mlir
		%tile_update = arm_sme.load_tile_slice %base[%c0], %tile, %tile_slice_index : memref<?x?xi8>, vector<[16]x[16]xi8>
		```
		dcaballeUnsubmitted Done Reply Inline Actions This looks like a vector load with a passthru value so I would remove the `_and_update` part to make it shorter but up to you :) dcaballe: This looks like a vector load with a passthru value so I would remove the `_and_update` part to…

		Example 2: Load a vector<[4]xf32> tile slice from memory into tile at given index.
		```mlir
		%tile_update = arm_sme.load_tile_slice %base[%c0], %tile, %tile_slice_index : memref<?x?xf32>, vector<[4]x[4]xf32>
		```

		Example 3: Load a vector<[1]xi128> tile slice from memory into tile at given index.
		```mlir
		%tile_update = arm_sme.load_tile_slice %base[%c0], %tile, %tile_slice_index : memref<?x?xi128>, vector<[1]x[1]xi128>
		```
		}];
		let arguments = (ins
		Arg<AnyMemRef, "the reference to load from">:$base,
		SMETile:$tile, Variadic<Index>:$indices, Index:$tile_slice_index);
		let results = (outs SMETile:$result);

		let extraClassDeclaration = [{
		MemRefType getMemRefType() {
		return ::llvm::cast<MemRefType>(getBase().getType());
		}
		VectorType getVectorType() {
		return ::llvm::cast<VectorType>(getResult().getType());
		}
		}];

		let assemblyFormat = [{
		awarzynskiUnsubmitted Done Reply Inline Actions Given that `type($tile)` will always be equal to `type($result)`, I wonder whether this wouldn't be cleaner: let assemblyFormat = [{ $base `[` $indices `]` `,` $tile `,` $tile_slice_index attr-dict `:` type($base) `,` type($tile) }]; or (with index type): let assemblyFormat = [{ $base `[` $indices `]` `,` $tile `,` $tile_slice_index attr-dict `:` type($base) `,` type($tile) `,` type($tile_slice_index) }] awarzynski: Given that `type($tile)` will always be equal to `type($result)`, I wonder whether this…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions Given that `type($tile)` will always be equal to `type($result)`, I wonder whether this wouldn't be cleaner: let assemblyFormat = [{ $base `[` $indices `]` `,` $tile `,` $tile_slice_index attr-dict `:` type($base) `,` type($tile) }]; it was originally like this but i changed the memref to have indices rather than a single index and `TypesMatchWith` no longer worked coming after `Variadic<Index>`, I've re-introduced this but moved `indices` after the `tile`to get it to work. c-rhodes: > Given that `type($tile)` will always be equal to `type($result)`, I wonder whether this…
		$base `[` $indices `]` `,` $tile `,` $tile_slice_index
		attr-dict `:` type($base) `,` type($result)
		}];
		}

		def StoreTileSliceOp : ArmSME_Op<"store_tile_slice"> {
		let summary = "Tile slice store operation";
		let description = [{
		Stores a 1D tile slice from a 2D SME "virtual tile" into memory. The tile
		slice is defined by the dimension of the 2D scalable vector type pointed by
		the index. A tile slice index describes where in the input tile the tile
		slice is stored from.

		The slice of memory written is defined by a base and indices and must be
		contiguous. The memref must be either rank 1 or rank 2, have dynamic
		dimensions since the operation is scalable, and the element type must be a
		scalar that matches the element type of the input tile.

		Example 1: Store vector<[16]xi8> tile slice from tile at given index to memory.
		```mlir
		arm_sme.store_tile_slice %tile, %tile_slice_index, %base[%c0] : vector<[16]x[16]xi8>, memref<?x?xi8>
		```

		Example 2: Store vector<[4]xf32> tile slice from tile at given index to memory.
		```mlir
		arm_sme.store_tile_slice %tile, %tile_slice_index, %base[%c0] : vector<[4]x[4]xf32>, memref<?x?xf32>
		```

		Example 3: Store a vector<[1]xi128> tile slice from tile at given index to memory.
		```mlir
		arm_sme.store_tile_slice %tile, %tile_slice_index, %base[%c0] : vector<[1]x[1]xi128>, memref<?x?xi128>
		```
		}];
		let arguments = (ins SMETile:$tile, Index:$tile_slice_index,
		Arg<AnyMemRef, "the reference to store to", [MemWrite]>:$base,
		Variadic<Index>:$indices);
		let extraClassDeclaration = [{
		MemRefType getMemRefType() {
		return ::llvm::cast<MemRefType>(getBase().getType());
		}
		VectorType getVectorType() {
		return ::llvm::cast<VectorType>(getTile().getType());
		}
		}];

		let assemblyFormat = [{
		$tile `,` $tile_slice_index `,` $base `[` $indices `]`
		attr-dict `:` type($base) `,` type($tile)
		}];
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ArmSME Intrinsic op definitions		// ArmSME Intrinsic op definitions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def MOPPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2], [I1]>;		def MOPPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2], [I1]>;
def MOPVector : ScalableVectorOfLengthAndType<[16, 8, 4, 2],		def MOPVector : ScalableVectorOfLengthAndType<[16, 8, 4, 2],
[I8, I16, BF16, F16, F32, F64]>;		[I8, I16, BF16, F16, F32, F64]>;
def LDSTPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2, 1], [I1]>;		def LDSTPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2, 1], [I1]>;
▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

mlir/lib/Conversion/ArmSMEToSCF/ArmSMEToSCF.cpp

This file was added.

				//===- ArmSMEToSCF.cpp - Convert ArmSME to SCF dialect ----------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements lowering of ArmSME operations to SCF.
				//
				//===----------------------------------------------------------------------===//
				#include "mlir/Conversion/ArmSMEToSCF/ArmSMEToSCF.h"

				#include "mlir/Dialect/Arith/IR/Arith.h"
				#include "mlir/Dialect/ArmSME/IR/ArmSME.h"
				#include "mlir/Dialect/ArmSME/Utils/Utils.h"
				#include "mlir/Dialect/SCF/IR/SCF.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Transforms/DialectConversion.h"

				namespace mlir {
				#define GEN_PASS_DEF_CONVERTARMSMETOSCF
				#include "mlir/Conversion/Passes.h.inc"
				} // namespace mlir

				using namespace mlir;

				namespace {

				/// Lower `arm_sme.tile_load` to a loop over the tile slices and load each slice
				/// using `arm_sme.load_tile_slice`.
				///
				/// BEFORE:
				/// ```mlir
				/// %tile = arm_sme.tile_load %src[%c0, %c0] :
				/// memref<?x?xi32>, vector<[4]x[4]xi32>
				/// ```
				///
				/// AFTER:
				/// ```mlir
				/// %tile_id = arm_sme.get_tile_id : i32
				/// %tile = arm_sme.cast_tile_to_vector %tile_id : i32 to vector<[4]x[4]xi32>
				/// %vscale = vector.vscale
				/// %c0 = arith.constant 0 : index
				/// %c1 = arith.constant 1 : index
				/// %min_svl_s = arith.constant 4 : index
				/// %svl_s = arith.muli %min_svl_s, %vscale : index
				/// scf.for %tile_slice_idx = %c0 to %svl_s step %c1 {
				/// %tile_update = arm_sme.load_tile_slice %src[%tile_slice_idx],
				/// %tile, %tile_slice_idx : memref<?x?xi32>, vector<[4]x[4]xi32>
				/// }
				/// ```
				struct TileLoadOpConversion : public OpRewritePattern<arm_sme::TileLoadOp> {
				using OpRewritePattern<arm_sme::TileLoadOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(arm_sme::TileLoadOp tileLoadOp,
				PatternRewriter &rewriter) const override {
				OpBuilder::InsertionGuard g(rewriter);
				auto loc = tileLoadOp.getLoc();
				auto tileType = tileLoadOp.getVectorType();
				auto tileElementType = tileType.getElementType();
				unsigned tileElementWidth = tileElementType.getIntOrFloatBitWidth();

				// Create 'arm_sme.get_tile' op.
				auto tileId = rewriter.create<arm_sme::GetTileID>(
				loc, rewriter.getIntegerType(tileElementWidth));

				// Create `arm_sme.cast_tile_to_vector` to cast tile ID to a vector type to
				// use as input tile to 'arm_sme.load_tile_slice' ops.
				auto tile =
				rewriter.create<arm_sme::CastTileToVector>(loc, tileType, tileId);

				awarzynskiUnsubmitted Done Reply Inline Actions [nit] "Init" in `tileInit` suggests that something might be initialised. Perhaps `tileIdAsVector`? awarzynski: [nit] "Init" in `tileInit` suggests that something might be initialised. Perhaps…
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions [nit] "Init" in `tileInit` suggests that something might be initialised. Perhaps `tileIdAsVector`? the idea here was to get an initial tile, I've renamed it to `tile`. c-rhodes: > [nit] "Init" in `tileInit` suggests that something might be initialised. Perhaps…
				// Create a loop that loads each ZA tile slice from memory.
				auto step = rewriter.create<arith::ConstantIndexOp>(loc, 1);
				auto minTileSlices = rewriter.create<arith::ConstantIndexOp>(
				loc, arm_sme::getSMETileSliceMinNumElts(tileElementType));
				auto vscale =
				rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());
				auto lowerBound = rewriter.create<arith::ConstantIndexOp>(loc, 0);
				auto numTileSlices =
				rewriter.create<arith::MulIOp>(loc, minTileSlices, vscale);
				auto forOp =
				rewriter.create<scf::ForOp>(loc, lowerBound, numTileSlices, step);

				rewriter.setInsertionPointToStart(forOp.getBody());
				dcaballeUnsubmitted Done Reply Inline Actions Add a builder guard at the beginning of the function to restore the insertion point dcaballe: Add a builder guard at the beginning of the function to restore the insertion point

				auto tileSliceIndex = forOp.getInductionVar();
				// TODO: use indices
				// Create 'arm_sme.load_tile_slice' to load tile slice from
				// memory into tile.
				awarzynskiUnsubmitted Not Done Reply Inline Actions Can you expand? awarzynski: Can you expand?
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions Can you expand? This is fixed in D156689 c-rhodes: > Can you expand? This is fixed in D156689
				rewriter.create<arm_sme::LoadTileSliceOp>(
				loc, tileType, tileLoadOp.getBase(), tile, tileSliceIndex,
				tileSliceIndex);

				rewriter.setInsertionPointAfter(forOp);

				// Replace 'arm_sme.tile_load' with the tile.
				rewriter.replaceOp(tileLoadOp, tile);

				return success();
				}
				};

				awarzynskiUnsubmitted Done Reply Inline Actions Is this 2nd cast really needed? Wouldn't this work: rewriter.replaceOp(tileLoadOp, tileInit); ? I'm probably missing something obvious 🤔 . awarzynski: Is this 2nd cast really needed? Wouldn't this work: ``` rewriter.replaceOp(tileLoadOp…
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions Is this 2nd cast really needed? Wouldn't this work: rewriter.replaceOp(tileLoadOp, tileInit); ? I'm probably missing something obvious 🤔 . You're right this isn't necessary, memory semantics prevent these from being reordered. I've removed it. c-rhodes: > Is this 2nd cast really needed? Wouldn't this work: > ``` > rewriter.replaceOp(tileLoadOp…
				/// Lower `arm_sme.tile_store` to a loop over the tile slices and store each
				/// slice using `arm_sme.store_tile_slice`.
				///
				/// BEFORE:
				/// ```mlir
				/// arm_sme.tile_store %tile, %dest[%c0, %c0]
				/// : memref<?x?xi32>, vector<[4]x[4]xi32
				/// ```
				///
				/// AFTER:
				/// ```mlir
				/// %vscale = vector.vscale
				/// %c0 = arith.constant 0 : index
				/// %c1 = arith.constant 1 : index
				/// %min_svl_s = arith.constant 4 : index
				/// %svl_s = arith.muli %min_svl_s, %vscale : index
				/// scf.for %tile_slice_idx = %c0 to %svl_s step %c1 {
				/// arm_sme.store_tile_slice %tile, %tile_slice_idx, %dest[%tile_slice_idx]
				/// : memref<?x?xi32>, vector<[4]x[4]xi32>
				/// }
				/// ```
				struct TileStoreOpConversion : public OpRewritePattern<arm_sme::TileStoreOp> {
				using OpRewritePattern<arm_sme::TileStoreOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(arm_sme::TileStoreOp tileStoreOp,
				PatternRewriter &rewriter) const override {
				OpBuilder::InsertionGuard g(rewriter);
				auto loc = tileStoreOp.getLoc();
				auto tileType = tileStoreOp.getVectorType();
				auto tileElementType = tileType.getElementType();

				// Create a loop that stores each ZA tile slice from memory.
				auto step = rewriter.create<arith::ConstantIndexOp>(loc, 1);
				auto minTileSlices = rewriter.create<arith::ConstantIndexOp>(
				loc, arm_sme::getSMETileSliceMinNumElts(tileElementType));
				auto vscale =
				rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());
				auto lowerBound = rewriter.create<arith::ConstantIndexOp>(loc, 0);
				auto numTileSlices =
				rewriter.create<arith::MulIOp>(loc, minTileSlices, vscale);
				auto forOp =
				rewriter.create<scf::ForOp>(loc, lowerBound, numTileSlices, step);

				rewriter.setInsertionPointToStart(forOp.getBody());

				auto tileSliceIndex = forOp.getInductionVar();
				dcaballeUnsubmitted Done Reply Inline Actions same dcaballe: same
				// TODO: use indices
				rewriter.replaceOpWithNewOp<arm_sme::StoreTileSliceOp>(
				tileStoreOp, tileStoreOp.getValueToStore(), tileSliceIndex,
				tileStoreOp.getBase(), tileSliceIndex);

				return success();
				}
				};

				} // namespace

				void mlir::populateArmSMEToSCFConversionPatterns(RewritePatternSet &patterns) {
				patterns.add<TileLoadOpConversion, TileStoreOpConversion>(
				patterns.getContext());
				}

				namespace {

				struct ConvertArmSMEToSCFPass
				: public impl::ConvertArmSMEToSCFBase<ConvertArmSMEToSCFPass> {
				void runOnOperation() override {
				RewritePatternSet patterns(&getContext());
				ConversionTarget target(getContext());
				populateArmSMEToSCFConversionPatterns(patterns);
				target.addLegalDialect<arm_sme::ArmSMEDialect, vector::VectorDialect,
				arith::ArithDialect, scf::SCFDialect>();
				target.addIllegalOp<arm_sme::TileLoadOp, arm_sme::TileStoreOp>();
				if (failed(applyPartialConversion(getOperation(), target,
				std::move(patterns))))
				signalPassFailure();
				}
				};

				} // namespace

				std::unique_ptr<Pass> mlir::createConvertArmSMEToSCFPass() {
				return std::make_unique<ConvertArmSMEToSCFPass>();
				}

mlir/lib/Conversion/ArmSMEToSCF/CMakeLists.txt

This file was added.

				add_mlir_conversion_library(MLIRArmSMEToSCF
				ArmSMEToSCF.cpp

				ADDITIONAL_HEADER_DIRS
				${MLIR_MAIN_INCLUDE_DIR}/mlir/Conversion/ArmSMEToSCF

				DEPENDS
				MLIRConversionPassIncGen

				LINK_LIBS PUBLIC
				MLIRArmSMEDialect
				MLIRArmSMEUtils
				MLIRTransforms
				)

mlir/lib/Conversion/CMakeLists.txt

	add_subdirectory(AffineToStandard)			add_subdirectory(AffineToStandard)
	add_subdirectory(AMDGPUToROCDL)			add_subdirectory(AMDGPUToROCDL)
	add_subdirectory(ArithCommon)			add_subdirectory(ArithCommon)
	add_subdirectory(ArithToLLVM)			add_subdirectory(ArithToLLVM)
	add_subdirectory(ArithToSPIRV)			add_subdirectory(ArithToSPIRV)
	add_subdirectory(ArmNeon2dToIntr)			add_subdirectory(ArmNeon2dToIntr)
				add_subdirectory(ArmSMEToSCF)
	add_subdirectory(AsyncToLLVM)			add_subdirectory(AsyncToLLVM)
	add_subdirectory(BufferizationToMemRef)			add_subdirectory(BufferizationToMemRef)
	add_subdirectory(ComplexToLibm)			add_subdirectory(ComplexToLibm)
	add_subdirectory(ComplexToLLVM)			add_subdirectory(ComplexToLLVM)
	add_subdirectory(ComplexToSPIRV)			add_subdirectory(ComplexToSPIRV)
	add_subdirectory(ComplexToStandard)			add_subdirectory(ComplexToStandard)
	add_subdirectory(ControlFlowToLLVM)			add_subdirectory(ControlFlowToLLVM)
	add_subdirectory(ControlFlowToSPIRV)			add_subdirectory(ControlFlowToSPIRV)
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp

Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	Value getTileSlicePtrIndex(unsigned rank, Value tileSliceIndex,
}		}

if (rank == 2)		if (rank == 2)
return tileSliceIndexI64;		return tileSliceIndexI64;

llvm_unreachable("memref has unexpected rank!");		llvm_unreachable("memref has unexpected rank!");
}		}

/// Conversion pattern for `arm_sme.tile_load` to SME intrinsics.		/// Lower `arm_sme.load_tile_slice` to SME intrinsics.
///		struct LoadTileSliceToArmSMELowering
/// Lower `arm_sme.tile_load` to a loop over the rows of ZA and load each row		: public ConvertOpToLLVMPattern<arm_sme::LoadTileSliceOp> {
/// using `arm_sme.intr.ld1*.horiz`.		using ConvertOpToLLVMPattern<
///		arm_sme::LoadTileSliceOp>::ConvertOpToLLVMPattern;
/// BEFORE:
/// ```mlir
/// %tile = arm_sme.tile_load %base[%c0, %c0] :
/// memref<?x?xi32>, vector<[4]x[4]xi32>
/// ```
///
/// AFTER:
/// ```mlir
/// %tile_id = arm_sme.get_tile_id : i32
/// %vscale = vector.vscale
/// %c0 = arith.constant 0 : index
/// %c1 = arith.constant 1 : index
/// %min_svl_s = arith.constant 4 : index
/// %svl_s = arith.muli %min_svl_s, %vscale : index
/// scf.for %tile_slice = %c0 to %svl_s step %c1 {
/// // (...)
/// "arm_sme.intr.ld1w.horiz"(%ptrue_s, %ptr, %tile_id, %tile_slice) :
/// (vector<[4]xi1>, !llvm.ptr, i32, i32) -> ()
/// }
/// %tile = arm_sme.cast_tile_to_vector %tile_id : i32 to vector<[4]x[4]xi32>
/// ```
struct TileLoadToArmSMELowering
: public ConvertOpToLLVMPattern<arm_sme::TileLoadOp> {
using ConvertOpToLLVMPattern<arm_sme::TileLoadOp>::ConvertOpToLLVMPattern;

LogicalResult		LogicalResult
matchAndRewrite(arm_sme::TileLoadOp tileLoadOp,		matchAndRewrite(arm_sme::LoadTileSliceOp loadTileSliceOp,
arm_sme::TileLoadOp::Adaptor adaptor,		arm_sme::LoadTileSliceOp::Adaptor adaptor,
ConversionPatternRewriter &rewriter) const override {		ConversionPatternRewriter &rewriter) const override {
auto loc = tileLoadOp.getLoc();		auto loc = loadTileSliceOp.getLoc();
auto tileType = tileLoadOp.getVectorType();		auto tileType = loadTileSliceOp.getVectorType();
auto tileElementType = tileType.getElementType();		auto tileElementType = tileType.getElementType();
unsigned tileElementWidth = tileElementType.getIntOrFloatBitWidth();		unsigned tileElementWidth = tileElementType.getIntOrFloatBitWidth();

// Create 'arm_sme.get_tile_id' op.		// Create 'arm_sme.cast_vector_to_tile' to get a tile ID for the tile being
auto tile = rewriter.create<arm_sme::GetTileID>(		// loaded to.
loc, rewriter.getIntegerType(tileElementWidth));		auto tile = rewriter.create<arm_sme::CastVectorToTile>(
		loc, rewriter.getIntegerType(tileElementWidth),
		loadTileSliceOp.getTile());

// Create a loop that loads each ZA tile slice from memory.
auto step = rewriter.create<arith::ConstantIndexOp>(loc, 1);
auto minTileSlices = rewriter.create<arith::ConstantIndexOp>(		auto minTileSlices = rewriter.create<arith::ConstantIndexOp>(
loc, arm_sme::getSMETileSliceMinNumElts(tileElementType));		loc, arm_sme::getSMETileSliceMinNumElts(tileElementType));
auto vscale =		auto vscale =
rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());		rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());
auto lowerBound = rewriter.create<arith::ConstantIndexOp>(loc, 0);
// This describes both the number of ZA tile slices and the number of		// This describes both the number of ZA tile slices and the number of
// elements in a vector of SVL bits for a given element type (SVL_B, SVL_H,		// elements in a vector of SVL bits for a given element type (SVL_B, SVL_H,
// ..., SVL_Q).		// ..., SVL_Q).
auto numTileSlices =		auto numTileSlices =
rewriter.create<arith::MulIOp>(loc, minTileSlices, vscale);		rewriter.create<arith::MulIOp>(loc, minTileSlices, vscale);
auto forOp =
rewriter.create<scf::ForOp>(loc, lowerBound, numTileSlices, step);
rewriter.setInsertionPointToStart(forOp.getBody());

// Create 'arm_sme.intr.ld1*.horiz' intrinsic to load ZA tile slice.		// Create 'arm_sme.intr.ld1*.horiz' intrinsic to load ZA tile slice.
auto memRefType = tileLoadOp.getMemRefType();		auto memRefType = loadTileSliceOp.getMemRefType();
auto tileSlice = forOp.getInductionVar();		auto tileSlice = loadTileSliceOp.getTileSliceIndex();
// TODO: The 'indices' argument for the 'base' memref is currently ignored,		// TODO: The 'indices' argument for the 'base' memref is currently ignored,
// 'tileSliceIndex' should be added to 'indices[0]'.		// 'tileSliceIndex' should be added to 'indices[0]'.
Value tileSliceIndex = getTileSlicePtrIndex(memRefType.getRank(), tileSlice,		Value tileSliceIndex = getTileSlicePtrIndex(memRefType.getRank(), tileSlice,
numTileSlices, loc, rewriter);		numTileSlices, loc, rewriter);
Value ptr = this->getStridedElementPtr(loc, memRefType, adaptor.getBase(),		Value ptr = this->getStridedElementPtr(loc, memRefType, adaptor.getBase(),
{tileSliceIndex}, rewriter);		{tileSliceIndex}, rewriter);

// Cast tile slice to i32 for intrinsic.		// Cast tile slice to i32 for intrinsic.
Show All 25 Lines	case 32:
tileI32, tileSliceI32);		tileI32, tileSliceI32);
break;		break;
case 64:		case 64:
rewriter.create<arm_sme::aarch64_sme_ld1d_horiz>(loc, allActiveMask, ptr,		rewriter.create<arm_sme::aarch64_sme_ld1d_horiz>(loc, allActiveMask, ptr,
tileI32, tileSliceI32);		tileI32, tileSliceI32);
break;		break;
}		}

rewriter.setInsertionPointAfter(forOp);

// The load intrinsics have no result, replace 'arm_sme.tile_load' with		// The load intrinsics have no result, replace 'arm_sme.tile_load' with
// 'arm_sme.cast_tile_to_vector' to preserve dataflow.		// 'arm_sme.cast_tile_to_vector' to preserve dataflow.
rewriter.replaceOpWithNewOp<arm_sme::CastTileToVector>(tileLoadOp, tileType,		rewriter.replaceOpWithNewOp<arm_sme::CastTileToVector>(loadTileSliceOp,
tile);		tileType, tile);

return success();		return success();
}		}
};		};

/// Conversion pattern for `arm_sme.tile_store` to SME intrinsics.		/// Lower for `arm_sme.store_tile_slice` to SME intrinsics.
///		struct StoreTileSliceToArmSMELowering
/// Lower `arm_sme.tile_store` to a loop over the rows of ZA and store each row		: public ConvertOpToLLVMPattern<arm_sme::StoreTileSliceOp> {
/// using `arm_sme.intr.st1*.horiz`.		using ConvertOpToLLVMPattern<
///		arm_sme::StoreTileSliceOp>::ConvertOpToLLVMPattern;
/// BEFORE:
/// ```mlir
/// arm_sme.tile_store %value, %base[%c0, %c0] : memref<?x?xi32>,
/// vector<[4]x[4]xi32
/// ```
///
/// AFTER:
/// ```mlir
/// %tile_id = arm_sme.cast_vector_to_tile %tile : vector<[4]x[4]xi32> to i32
/// %vscale = vector.vscale
/// %c0 = arith.constant 0 : index
/// %c1 = arith.constant 1 : index
/// %min_svl_s = arith.constant 4 : index
/// %svl_s = arith.muli %min_svl_s, %vscale : index
/// scf.for %tile_slice = %c0 to %svl_s step %c1 {
/// // (...)
/// "arm_sme.intr.st1w.horiz"(%ptrue_s, %ptr, %tile_id, %tile_slice) :
/// (vector<[4]xi1>, !llvm.ptr, i32, i32) -> ()
/// }
/// ```
struct TileStoreToArmSMELowering
: public ConvertOpToLLVMPattern<arm_sme::TileStoreOp> {
using ConvertOpToLLVMPattern<arm_sme::TileStoreOp>::ConvertOpToLLVMPattern;

LogicalResult		LogicalResult
matchAndRewrite(arm_sme::TileStoreOp tileStoreOp,		matchAndRewrite(arm_sme::StoreTileSliceOp storeTileSliceOp,
arm_sme::TileStoreOp::Adaptor adaptor,		arm_sme::StoreTileSliceOp::Adaptor adaptor,
ConversionPatternRewriter &rewriter) const override {		ConversionPatternRewriter &rewriter) const override {
auto loc = tileStoreOp.getLoc();		auto loc = storeTileSliceOp.getLoc();
auto tileType = tileStoreOp.getVectorType();		auto tileType = storeTileSliceOp.getVectorType();
auto tileElementType = tileType.getElementType();		auto tileElementType = tileType.getElementType();
unsigned tileElementWidth = tileElementType.getIntOrFloatBitWidth();		unsigned tileElementWidth = tileElementType.getIntOrFloatBitWidth();

// Create 'arm_sme.cast_vector_to_tile' to get a tile ID for the vector		// Create 'arm_sme.cast_vector_to_tile' to get a tile ID for the vector
// being stored.		// being stored.
auto tile = rewriter.create<arm_sme::CastVectorToTile>(		auto tile = rewriter.create<arm_sme::CastVectorToTile>(
loc, rewriter.getIntegerType(tileElementWidth),		loc, rewriter.getIntegerType(tileElementWidth),
tileStoreOp.getValueToStore());		storeTileSliceOp.getTile());

// Create a loop that stores each ZA tile slice to memory.
auto step = rewriter.create<arith::ConstantIndexOp>(loc, 1);
auto minTileSlices = rewriter.create<arith::ConstantIndexOp>(		auto minTileSlices = rewriter.create<arith::ConstantIndexOp>(
loc, arm_sme::getSMETileSliceMinNumElts(tileElementType));		loc, arm_sme::getSMETileSliceMinNumElts(tileElementType));
auto vscale =		auto vscale =
rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());		rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());
auto lowerBound = rewriter.create<arith::ConstantIndexOp>(loc, 0);
// This describes both the number of ZA tile slices and the number of		// This describes both the number of ZA tile slices and the number of
// elements in a vector of SVL bits for a given element type (SVL_B, SVL_H,		// elements in a vector of SVL bits for a given element type (SVL_B, SVL_H,
// ..., SVL_Q).		// ..., SVL_Q).
auto numTileSlices =		auto numTileSlices =
rewriter.create<arith::MulIOp>(loc, minTileSlices, vscale);		rewriter.create<arith::MulIOp>(loc, minTileSlices, vscale);
auto forOp =
rewriter.create<scf::ForOp>(loc, lowerBound, numTileSlices, step);
rewriter.setInsertionPointToStart(forOp.getBody());

// Create 'arm_sme.intr.st1*.horiz' intrinsic to store ZA tile slice.		// Create 'arm_sme.intr.st1*.horiz' intrinsic to store ZA tile slice.
auto memRefType = tileStoreOp.getMemRefType();		auto memRefType = storeTileSliceOp.getMemRefType();
auto tileSlice = forOp.getInductionVar();		auto tileSlice = storeTileSliceOp.getTileSliceIndex();
// TODO: The 'indices' argument for the 'base' memref is currently ignored,		// TODO: The 'indices' argument for the 'base' memref is currently ignored,
// 'tileSliceIndex' should be added to 'indices[0]'.		// 'tileSliceIndex' should be added to 'indices[0]'.
Value tileSliceIndex = getTileSlicePtrIndex(memRefType.getRank(), tileSlice,		Value tileSliceIndex = getTileSlicePtrIndex(memRefType.getRank(), tileSlice,
numTileSlices, loc, rewriter);		numTileSlices, loc, rewriter);
Value ptr = this->getStridedElementPtr(loc, memRefType, adaptor.getBase(),		Value ptr = this->getStridedElementPtr(loc, memRefType, adaptor.getBase(),
{tileSliceIndex}, rewriter);		{tileSliceIndex}, rewriter);

// Cast tile slice to i32 for intrinsic.		// Cast tile slice to i32 for intrinsic.
Show All 9 Lines	matchAndRewrite(arm_sme::StoreTileSliceOp storeTileSliceOp,
auto allActiveMask = rewriter.create<vector::SplatOp>(loc, predTy, one);		auto allActiveMask = rewriter.create<vector::SplatOp>(loc, predTy, one);

Value tileI32 = castTileIDToI32(tile, loc, rewriter);		Value tileI32 = castTileIDToI32(tile, loc, rewriter);
switch (tileElementWidth) {		switch (tileElementWidth) {
default:		default:
llvm_unreachable("unexpected element type!");		llvm_unreachable("unexpected element type!");
case 8:		case 8:
rewriter.replaceOpWithNewOp<arm_sme::aarch64_sme_st1b_horiz>(		rewriter.replaceOpWithNewOp<arm_sme::aarch64_sme_st1b_horiz>(
tileStoreOp, allActiveMask, ptr, tileI32, tileSliceI32);		storeTileSliceOp, allActiveMask, ptr, tileI32, tileSliceI32);
break;		break;
case 16:		case 16:
rewriter.replaceOpWithNewOp<arm_sme::aarch64_sme_st1h_horiz>(		rewriter.replaceOpWithNewOp<arm_sme::aarch64_sme_st1h_horiz>(
tileStoreOp, allActiveMask, ptr, tileI32, tileSliceI32);		storeTileSliceOp, allActiveMask, ptr, tileI32, tileSliceI32);
break;		break;
case 32:		case 32:
rewriter.replaceOpWithNewOp<arm_sme::aarch64_sme_st1w_horiz>(		rewriter.replaceOpWithNewOp<arm_sme::aarch64_sme_st1w_horiz>(
tileStoreOp, allActiveMask, ptr, tileI32, tileSliceI32);		storeTileSliceOp, allActiveMask, ptr, tileI32, tileSliceI32);
break;		break;
case 64:		case 64:
rewriter.replaceOpWithNewOp<arm_sme::aarch64_sme_st1d_horiz>(		rewriter.replaceOpWithNewOp<arm_sme::aarch64_sme_st1d_horiz>(
tileStoreOp, allActiveMask, ptr, tileI32, tileSliceI32);		storeTileSliceOp, allActiveMask, ptr, tileI32, tileSliceI32);
break;		break;
}		}

return success();		return success();
}		}
};		};

} // namespace		} // namespace
Show All 34 Lines	funcOp->walk<WalkOrder::PreOrder>(
[&](arm_sme::aarch64_sme_za_disable op) { hasDisableZA = true; });		[&](arm_sme::aarch64_sme_za_disable op) { hasDisableZA = true; });
return !funcOp->hasAttr("arm_za") \|\| hasDisableZA;		return !funcOp->hasAttr("arm_za") \|\| hasDisableZA;
});		});
}		}

void mlir::populateArmSMELegalizeForLLVMExportPatterns(		void mlir::populateArmSMELegalizeForLLVMExportPatterns(
LLVMTypeConverter &converter, RewritePatternSet &patterns) {		LLVMTypeConverter &converter, RewritePatternSet &patterns) {
patterns.add<EnableZAPattern, DisableZAPattern>(patterns.getContext());		patterns.add<EnableZAPattern, DisableZAPattern>(patterns.getContext());
patterns.add<ZeroOpConversion, TileLoadToArmSMELowering,		patterns.add<ZeroOpConversion, StoreTileSliceToArmSMELowering,
TileStoreToArmSMELowering>(converter);		LoadTileSliceToArmSMELowering>(converter);
}		}

mlir/test/Conversion/ArmSMEToSCF/arm-sme-to-scf.mlir

This file was added.

				// RUN: mlir-opt %s -convert-arm-sme-to-scf -cse -split-input-file \| FileCheck %s

				// CHECK-LABEL: func.func @arm_sme_tile_load(
				// CHECK-SAME: %[[SRC:.*]]: memref<?x?xi32>) {
				// CHECK-NEXT: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i32
				// CHECK-NEXT: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i32 to vector<[4]x[4]xi32>
				// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
				// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
				// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
				// CHECK-DAG: %[[VSCALE:.*]] = vector.vscale
				// CHECK-NEXT: %[[NUM_TILE_SLICES:.*]] = arith.muli %[[C4]], %[[VSCALE]] : index
				// CHECK-NEXT: scf.for %[[TILE_SLICE_INDEX:.*]] = %[[C0]] to %[[NUM_TILE_SLICES]] step %[[C1]] {
				// CHECK-NEXT: arm_sme.load_tile_slice %[[SRC]]{{\[}}%[[TILE_SLICE_INDEX]]], %[[CAST_TILE_TO_VECTOR]], %[[TILE_SLICE_INDEX]] : memref<?x?xi32>, vector<[4]x[4]xi32>
				func.func @arm_sme_tile_load(%src : memref<?x?xi32>) {
				%c0 = arith.constant 0 : index
				%tile = arm_sme.tile_load %src[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
				return
				}

				// -----

				// CHECK-LABEL: func.func @arm_sme_tile_store(
				// CHECK-SAME: %[[TILE:.*]]: vector<[4]x[4]xi32>,
				// CHECK-SAME: %[[DEST:.*]]: memref<?x?xi32>) {
				// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
				// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
				// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
				// CHECK-DAG: %[[VSCALE:.*]] = vector.vscale
				// CHECK: %[[NUM_TILE_SLICES:.*]] = arith.muli %[[C4]], %[[VSCALE]] : index
				// CHECK: scf.for %[[TILE_SLICE_INDEX:.*]] = %[[C0]] to %[[NUM_TILE_SLICES]] step %[[C1]] {
				// CHECK: arm_sme.store_tile_slice %[[TILE]], %[[TILE_SLICE_INDEX]], %[[DEST]]{{\[}}%[[TILE_SLICE_INDEX]]] : memref<?x?xi32>, vector<[4]x[4]xi32>
				func.func @arm_sme_tile_store(%tile : vector<[4]x[4]xi32>, %dest : memref<?x?xi32>) {
				%c0 = arith.constant 0 : index
				arm_sme.tile_store %tile, %dest[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
				return
				}

mlir/test/Dialect/ArmSME/arm-sme-to-llvm-casts.mlir

This file was added.

				// RUN: mlir-opt %s -convert-arm-sme-to-scf -convert-vector-to-llvm="enable-arm-sme" -split-input-file \| FileCheck %s

				// This test verifies the temporary casts that are emitted when lowering to
				// intrinsics to preserve data flow are correct. Canonicalization will remove
				// these.

				// CHECK-LABEL: @arm_sme_zero
				// CHECK: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i8
				// CHECK: arm_sme.intr.zero
				// CHECK: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i8 to vector<[16]x[16]xi8>
				// CHECK: scf.for
				// CHECK: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[CAST_TILE_TO_VECTOR]] : vector<[16]x[16]xi8> to i8
				// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[CAST_VECTOR_TO_TILE]] : i8 to i32
				// CHECK: "arm_sme.intr.st1b.horiz"({{.}}, {{.}}, %[[TILE_ID_I32]], {{.*}}) : (vector<[16]xi1>, !llvm.ptr, i32, i32) -> ()
				func.func @arm_sme_zero(%dest : memref<?x?xi8>) {
				%c0 = arith.constant 0 : index
				%tile = arm_sme.zero : vector<[16]x[16]xi8>
				arm_sme.tile_store %tile, %dest[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>
				return
				}

				// -----

				// CHECK-LABEL: @arm_sme_tile_load
				// CHECK: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i8
				// CHECK: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i8 to vector<[16]x[16]xi8>
				// CHECK: scf.for
				// CHECK: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[CAST_TILE_TO_VECTOR]] : vector<[16]x[16]xi8> to i8
				// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[CAST_VECTOR_TO_TILE]] : i8 to i32
				// CHECK: "arm_sme.intr.ld1b.horiz"({{.}}, {{.}}, %[[TILE_ID_I32]], {{.*}}) : (vector<[16]xi1>, !llvm.ptr, i32, i32) -> ()
				// CHECK: }
				// CHECK: return %[[CAST_TILE_TO_VECTOR]] : vector<[16]x[16]xi8>
				func.func @arm_sme_tile_load(%dest : memref<?x?xi8>) -> vector<[16]x[16]xi8> {
				%c0 = arith.constant 0 : index
				%tile = arm_sme.tile_load %dest[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>
				return %tile : vector<[16]x[16]xi8>
				}

				// -----

				// CHECK-LABEL: @arm_sme_tile_store(
				// CHECK-SAME: %[[TILE:.*]]: vector<[16]x[16]xi8>,
				// CHECK: scf.for
				// CHECK: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[16]x[16]xi8> to i8
				// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[CAST_VECTOR_TO_TILE]] : i8 to i32
				// CHECK: "arm_sme.intr.st1b.horiz"({{.}}, {{.}}, %[[TILE_ID_I32]], {{.*}}) : (vector<[16]xi1>, !llvm.ptr, i32, i32) -> ()
				func.func @arm_sme_tile_store(%tile : vector<[16]x[16]xi8>, %dest : memref<?x?xi8>) {
				%c0 = arith.constant 0 : index
				arm_sme.tile_store %tile, %dest[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>
				return
				}

mlir/test/Dialect/ArmSME/roundtrip.mlir

	Show First 20 Lines • Show All 345 Lines • ▼ Show 20 Lines
	// -----			// -----

	func.func @arm_sme_tile_store_f64(%tile : vector<[2]x[2]xf64>, %dest : memref<?x?xf64>) {			func.func @arm_sme_tile_store_f64(%tile : vector<[2]x[2]xf64>, %dest : memref<?x?xf64>) {
	// CHECK: arm_sme.tile_store {{.*}} : memref<?x?xf64>, vector<[2]x[2]xf64>			// CHECK: arm_sme.tile_store {{.*}} : memref<?x?xf64>, vector<[2]x[2]xf64>
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	arm_sme.tile_store %tile, %dest[%c0, %c0] : memref<?x?xf64>, vector<[2]x[2]xf64>			arm_sme.tile_store %tile, %dest[%c0, %c0] : memref<?x?xf64>, vector<[2]x[2]xf64>
	return			return
	}			}

				// -----

				func.func @arm_sme_load_tile_slice_i8(%src : memref<?x?xi8>, %tile : vector<[16]x[16]xi8>, %tile_slice_index : index) {
				// CHECK: arm_sme.load_tile_slice {{.*}} : memref<?x?xi8>, vector<[16]x[16]xi8>
				%c0 = arith.constant 0 : index
				%tile_update = arm_sme.load_tile_slice %src[%c0], %tile, %tile_slice_index : memref<?x?xi8>, vector<[16]x[16]xi8>
				return
				}

				// -----

				func.func @arm_sme_load_tile_slice_i16(%src : memref<?x?xi16>, %tile : vector<[8]x[8]xi16>, %tile_slice_index : index) {
				// CHECK: arm_sme.load_tile_slice {{.*}} : memref<?x?xi16>, vector<[8]x[8]xi16>
				%c0 = arith.constant 0 : index
				%tile_update = arm_sme.load_tile_slice %src[%c0], %tile, %tile_slice_index : memref<?x?xi16>, vector<[8]x[8]xi16>
				return
				}

				// -----

				func.func @arm_sme_load_tile_slice_i32(%src : memref<?x?xi32>, %tile : vector<[4]x[4]xi32>, %tile_slice_index : index) {
				// CHECK: arm_sme.load_tile_slice {{.*}} : memref<?x?xi32>, vector<[4]x[4]xi32>
				%c0 = arith.constant 0 : index
				%tile_update = arm_sme.load_tile_slice %src[%c0], %tile, %tile_slice_index : memref<?x?xi32>, vector<[4]x[4]xi32>
				return
				}

				// -----

				func.func @arm_sme_load_tile_slice_i64(%src : memref<?x?xi64>, %tile : vector<[2]x[2]xi64>, %tile_slice_index : index) {
				// CHECK: arm_sme.load_tile_slice {{.*}} : memref<?x?xi64>, vector<[2]x[2]xi64>
				%c0 = arith.constant 0 : index
				%tile_update = arm_sme.load_tile_slice %src[%c0], %tile, %tile_slice_index : memref<?x?xi64>, vector<[2]x[2]xi64>
				return
				}

				// -----

				func.func @arm_sme_load_tile_slice_i128(%src : memref<?x?xi128>, %tile : vector<[1]x[1]xi128>, %tile_slice_index : index) {
				// CHECK: arm_sme.load_tile_slice {{.*}} : memref<?x?xi128>, vector<[1]x[1]xi128>
				%c0 = arith.constant 0 : index
				%tile_update = arm_sme.load_tile_slice %src[%c0], %tile, %tile_slice_index : memref<?x?xi128>, vector<[1]x[1]xi128>
				return
				}

				// -----

				func.func @arm_sme_load_tile_slice_f16(%src : memref<?x?xf16>, %tile : vector<[8]x[8]xf16>, %tile_slice_index : index) {
				// CHECK: arm_sme.load_tile_slice {{.*}} : memref<?x?xf16>, vector<[8]x[8]xf16>
				%c0 = arith.constant 0 : index
				%tile_update = arm_sme.load_tile_slice %src[%c0], %tile, %tile_slice_index : memref<?x?xf16>, vector<[8]x[8]xf16>
				return
				}

				// -----

				func.func @arm_sme_load_tile_slice_bf16(%src : memref<?x?xbf16>, %tile : vector<[8]x[8]xbf16>, %tile_slice_index : index) {
				// CHECK: arm_sme.load_tile_slice {{.*}} : memref<?x?xbf16>, vector<[8]x[8]xbf16>
				%c0 = arith.constant 0 : index
				%tile_update = arm_sme.load_tile_slice %src[%c0], %tile, %tile_slice_index : memref<?x?xbf16>, vector<[8]x[8]xbf16>
				return
				}

				// -----

				func.func @arm_sme_load_tile_slice_f32(%src : memref<?x?xf32>, %tile : vector<[4]x[4]xf32>, %tile_slice_index : index) {
				// CHECK: arm_sme.load_tile_slice {{.*}} : memref<?x?xf32>, vector<[4]x[4]xf32>
				%c0 = arith.constant 0 : index
				%tile_update = arm_sme.load_tile_slice %src[%c0], %tile, %tile_slice_index : memref<?x?xf32>, vector<[4]x[4]xf32>
				return
				}

				// -----

				func.func @arm_sme_load_tile_slice_f64(%src : memref<?x?xf64>, %tile : vector<[2]x[2]xf64>, %tile_slice_index : index) {
				// CHECK: arm_sme.load_tile_slice {{.*}} : memref<?x?xf64>, vector<[2]x[2]xf64>
				%c0 = arith.constant 0 : index
				%tile_update = arm_sme.load_tile_slice %src[%c0], %tile, %tile_slice_index : memref<?x?xf64>, vector<[2]x[2]xf64>
				return
				}

				// -----

				func.func @arm_sme_store_tile_slice_i8(%tile : vector<[16]x[16]xi8>, %tile_slice_index : index, %dest : memref<?x?xi8>) -> () {
				// CHECK: arm_sme.store_tile_slice {{.*}} : memref<?x?xi8>, vector<[16]x[16]xi8>
				%c0 = arith.constant 0 : index
				arm_sme.store_tile_slice %tile, %tile_slice_index, %dest[%c0] : memref<?x?xi8>, vector<[16]x[16]xi8>
				return
				}

				// -----

				func.func @arm_sme_store_tile_slice_i16(%tile : vector<[8]x[8]xi16>, %tile_slice_index : index, %dest : memref<?x?xi16>) -> () {
				// CHECK: arm_sme.store_tile_slice {{.*}} : memref<?x?xi16>, vector<[8]x[8]xi16>
				%c0 = arith.constant 0 : index
				arm_sme.store_tile_slice %tile, %tile_slice_index, %dest[%c0] : memref<?x?xi16>, vector<[8]x[8]xi16>
				return
				}

				// -----

				func.func @arm_sme_store_tile_slice_i32(%tile : vector<[4]x[4]xi32>, %tile_slice_index : index, %dest : memref<?x?xi32>) -> () {
				// CHECK: arm_sme.store_tile_slice {{.*}} : memref<?x?xi32>, vector<[4]x[4]xi32>
				%c0 = arith.constant 0 : index
				arm_sme.store_tile_slice %tile, %tile_slice_index, %dest[%c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
				return
				}

				// -----

				func.func @arm_sme_store_tile_slice_i64(%tile : vector<[2]x[2]xi64>, %tile_slice_index : index, %dest : memref<?x?xi64>) -> () {
				// CHECK: arm_sme.store_tile_slice {{.*}} : memref<?x?xi64>, vector<[2]x[2]xi64>
				%c0 = arith.constant 0 : index
				arm_sme.store_tile_slice %tile, %tile_slice_index, %dest[%c0] : memref<?x?xi64>, vector<[2]x[2]xi64>
				return
				}

				// -----

				func.func @arm_sme_store_tile_slice_i128(%tile : vector<[1]x[1]xi128>, %tile_slice_index : index, %dest : memref<?x?xi128>) -> () {
				// CHECK: arm_sme.store_tile_slice {{.*}} : memref<?x?xi128>, vector<[1]x[1]xi128>
				%c0 = arith.constant 0 : index
				arm_sme.store_tile_slice %tile, %tile_slice_index, %dest[%c0] : memref<?x?xi128>, vector<[1]x[1]xi128>
				return
				}

				// -----

				func.func @arm_sme_store_tile_slice_f16(%tile : vector<[8]x[8]xf16>, %tile_slice_index : index, %dest : memref<?x?xf16>) -> () {
				// CHECK: arm_sme.store_tile_slice {{.*}} : memref<?x?xf16>, vector<[8]x[8]xf16>
				%c0 = arith.constant 0 : index
				arm_sme.store_tile_slice %tile, %tile_slice_index, %dest[%c0] : memref<?x?xf16>, vector<[8]x[8]xf16>
				return
				}

				// -----

				func.func @arm_sme_store_tile_slice_bf16(%tile : vector<[8]x[8]xbf16>, %tile_slice_index : index, %dest : memref<?x?xbf16>) -> () {
				// CHECK: arm_sme.store_tile_slice {{.*}} : memref<?x?xbf16>, vector<[8]x[8]xbf16>
				%c0 = arith.constant 0 : index
				arm_sme.store_tile_slice %tile, %tile_slice_index, %dest[%c0] : memref<?x?xbf16>, vector<[8]x[8]xbf16>
				return
				}

				// -----

				func.func @arm_sme_store_tile_slice_f32(%tile : vector<[4]x[4]xf32>, %tile_slice_index : index, %dest : memref<?x?xf32>) -> () {
				// CHECK: arm_sme.store_tile_slice {{.*}} : memref<?x?xf32>, vector<[4]x[4]xf32>
				%c0 = arith.constant 0 : index
				arm_sme.store_tile_slice %tile, %tile_slice_index, %dest[%c0] : memref<?x?xf32>, vector<[4]x[4]xf32>
				return
				}

				// -----

				func.func @arm_sme_store_tile_slice_f64(%tile : vector<[2]x[2]xf64>, %tile_slice_index : index, %dest : memref<?x?xf64>) -> () {
				// CHECK: arm_sme.store_tile_slice {{.*}} : memref<?x?xf64>, vector<[2]x[2]xf64>
				%c0 = arith.constant 0 : index
				arm_sme.store_tile_slice %tile, %tile_slice_index, %dest[%c0] : memref<?x?xf64>, vector<[2]x[2]xf64>
				return
				}

mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir

	// RUN: mlir-opt %s -convert-vector-to-arm-sme -convert-vector-to-llvm="enable-arm-sme" -cse -canonicalize -split-input-file \| FileCheck %s			// RUN: mlir-opt %s -convert-vector-to-arm-sme -convert-arm-sme-to-scf -convert-vector-to-llvm="enable-arm-sme" -cse -canonicalize -split-input-file \| FileCheck %s

	// CHECK-LABEL: @transfer_write_2d_zero_i8(			// CHECK-LABEL: @transfer_write_2d_zero_i8(
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)
	// CHECK-DAG: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>			// CHECK-DAG: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
	// CHECK-DAG: %[[MIN_SVL_B:.*]] = arith.constant 16 : index			// CHECK-DAG: %[[MIN_SVL_B:.*]] = arith.constant 16 : index
	// CHECK-DAG: %[[C255:.*]] = arith.constant 255 : i32			// CHECK-DAG: %[[C255:.*]] = arith.constant 255 : i32
	// CHECK-DAG: %[[PTRUE_ALL:.*]] = arith.constant dense<true> : vector<[16]xi1>			// CHECK-DAG: %[[PTRUE_ALL:.*]] = arith.constant dense<true> : vector<[16]xi1>
	// CHECK-DAG: "arm_sme.intr.zero"(%[[C255]]) : (i32) -> ()			// CHECK-DAG: "arm_sme.intr.zero"(%[[C255]]) : (i32) -> ()
	// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i8			// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i8
	// CHECK-DAG: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64			// CHECK-DAG: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64
	// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index			// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index
	// CHECK-NEXT: %[[SVL_B:.*]] = arith.muli %[[VSCALE_IDX]], %[[MIN_SVL_B]] : index			// CHECK-NEXT: %[[SVL_B:.*]] = arith.muli %[[VSCALE_IDX]], %[[MIN_SVL_B]] : index
	// CHECK-NEXT: scf.for %[[TILE_SLICE:.*]] = %[[C0]] to %[[SVL_B]] step %[[C1]] {			// CHECK-NEXT: scf.for %[[TILE_SLICE:.*]] = %[[C0]] to %[[SVL_B]] step %[[C1]] {
	// CHECK-NEXT: %[[TILE_SLICE_I64:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i64			// CHECK: %[[TILE_SLICE_I64:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i64
	// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>			// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-NEXT: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>			// CHECK-NEXT: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]] : i64			// CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]] : i64
	// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8			// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
	// CHECK-NEXT: %[[TILE_SLICE_I32:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i32			// CHECK-NEXT: %[[TILE_SLICE_I32:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i32
	// CHECK-NEXT: %[[TILE_ID_I32:.*]] = arith.extui %[[TILE_ID]] : i8 to i32			// CHECK-NEXT: %[[TILE_ID_I32:.*]] = arith.extui %[[TILE_ID]] : i8 to i32
	// CHECK-NEXT: "arm_sme.intr.st1b.horiz"(%[[PTRUE_ALL]], %[[GEP]], %[[TILE_ID_I32]], %[[TILE_SLICE_I32]]) : (vector<[16]xi1>, !llvm.ptr, i32, i32) -> ()			// CHECK-NEXT: "arm_sme.intr.st1b.horiz"(%[[PTRUE_ALL]], %[[GEP]], %[[TILE_ID_I32]], %[[TILE_SLICE_I32]]) : (vector<[16]xi1>, !llvm.ptr, i32, i32) -> ()
	func.func @transfer_write_2d_zero_i8(%arg0 : memref<?x?xi8>) {			func.func @transfer_write_2d_zero_i8(%arg0 : memref<?x?xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant dense<0> : vector<[16]x[16]xi8>			%cst = arith.constant dense<0> : vector<[16]x[16]xi8>
	vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>			vector.transfer_write %cst, %arg0[%c0, %c0] {in_bounds = [true, true]} : vector<[16]x[16]xi8>, memref<?x?xi8>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_load_i8(			// CHECK-LABEL: @vector_load_i8(
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)
	// CHECK-DAG: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>			// CHECK-DAG: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i8			// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i8
				// CHECK-DAG: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i8 to vector<[16]x[16]xi8>
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
	// CHECK-DAG: %[[MIN_SVL_B:.*]] = arith.constant 16 : index			// CHECK-DAG: %[[MIN_SVL_B:.*]] = arith.constant 16 : index
	// CHECK-DAG: %[[PTRUE_ALL:.*]] = arith.constant dense<true> : vector<[16]xi1>			// CHECK-DAG: %[[PTRUE_ALL:.*]] = arith.constant dense<true> : vector<[16]xi1>
	// CHECK-DAG: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64			// CHECK-DAG: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64
	// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index			// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index
	// CHECK-NEXT: %[[SVL_B:.*]] = arith.muli %[[VSCALE_IDX]], %[[MIN_SVL_B]] : index			// CHECK-NEXT: %[[SVL_B:.*]] = arith.muli %[[VSCALE_IDX]], %[[MIN_SVL_B]] : index
	// CHECK-NEXT: scf.for %[[TILE_SLICE:.*]] = %[[C0]] to %[[SVL_B]] step %[[C1]] {			// CHECK-NEXT: scf.for %[[TILE_SLICE:.*]] = %[[C0]] to %[[SVL_B]] step %[[C1]] {
	// CHECK-NEXT: %[[TILE_SLICE_I64:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i64			// CHECK: %[[TILE_SLICE_I64:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i64
	// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>			// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-NEXT: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>			// CHECK-NEXT: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]] : i64			// CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]] : i64
	// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8			// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
	// CHECK-NEXT: %[[TILE_SLICE_I32:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i32			// CHECK-NEXT: %[[TILE_SLICE_I32:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i32
	// CHECK-NEXT: %[[TILE_ID_I32:.*]] = arith.extui %[[TILE_ID]] : i8 to i32			// CHECK-NEXT: %[[TILE_ID_I32:.*]] = arith.extui %[[TILE_ID]] : i8 to i32
	// CHECK-NEXT: "arm_sme.intr.ld1b.horiz"(%[[PTRUE_ALL]], %[[GEP]], %[[TILE_ID_I32]], %[[TILE_SLICE_I32]]) : (vector<[16]xi1>, !llvm.ptr, i32, i32) -> ()			// CHECK-NEXT: "arm_sme.intr.ld1b.horiz"(%[[PTRUE_ALL]], %[[GEP]], %[[TILE_ID_I32]], %[[TILE_SLICE_I32]]) : (vector<[16]xi1>, !llvm.ptr, i32, i32) -> ()
	// CHECK-NEXT: }			// CHECK-NEXT: }
	// CHECK-NEXT: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i8 to vector<[16]x[16]xi8>
	// CHECK-NEXT: return %[[CAST_TILE_TO_VECTOR]] : vector<[16]x[16]xi8>			// CHECK-NEXT: return %[[CAST_TILE_TO_VECTOR]] : vector<[16]x[16]xi8>
	func.func @vector_load_i8(%arg0 : memref<?x?xi8>) -> vector<[16]x[16]xi8> {			func.func @vector_load_i8(%arg0 : memref<?x?xi8>) -> vector<[16]x[16]xi8> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%tile = vector.load %arg0[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>			%tile = vector.load %arg0[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>
	return %tile : vector<[16]x[16]xi8>			return %tile : vector<[16]x[16]xi8>
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_load_i8_from_rank_1_memref(			// CHECK-LABEL: @vector_load_i8_from_rank_1_memref(
	// CHECK-SAME: %[[ARG0:.*]]: memref<?xi8>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?xi8>)
	// CHECK-DAG: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?xi8> to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>			// CHECK-DAG: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?xi8> to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
	// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i8			// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i8
				// CHECK-DAG: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i8 to vector<[16]x[16]xi8>
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
	// CHECK-DAG: %[[MIN_SVL_B:.*]] = arith.constant 16 : index			// CHECK-DAG: %[[MIN_SVL_B:.*]] = arith.constant 16 : index
	// CHECK-DAG: %[[PTRUE_ALL:.*]] = arith.constant dense<true> : vector<[16]xi1>			// CHECK-DAG: %[[PTRUE_ALL:.*]] = arith.constant dense<true> : vector<[16]xi1>
	// CHECK-DAG: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64			// CHECK-DAG: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64
	// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index			// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index
	// CHECK-NEXT: %[[SVL_B:.*]] = arith.muli %[[VSCALE_IDX]], %[[MIN_SVL_B]] : index			// CHECK-NEXT: %[[SVL_B:.*]] = arith.muli %[[VSCALE_IDX]], %[[MIN_SVL_B]] : index
	// CHECK-NEXT: scf.for %[[TILE_SLICE:.*]] = %[[C0]] to %[[SVL_B]] step %[[C1]] {			// CHECK-NEXT: scf.for %[[TILE_SLICE:.*]] = %[[C0]] to %[[SVL_B]] step %[[C1]] {
				// CHECK-NEXT: %[[VSCALE_1:.*]] = "llvm.intr.vscale"() : () -> i64
				// CHECK-NEXT: %[[VSCALE_IDX_1:.*]] = builtin.unrealized_conversion_cast %[[VSCALE_1]] : i64 to index
				// CHECK-NEXT: %[[SVL_B_1:.*]] = arith.muli %[[VSCALE_IDX_1]], %[[MIN_SVL_B]] : index
	// CHECK-NEXT: %[[TILE_SLICE_I64:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i64			// CHECK-NEXT: %[[TILE_SLICE_I64:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i64
	// CHECK-NEXT: %[[SVL_B_I64:.*]] = arith.index_castui %[[SVL_B]] : index to i64			// CHECK-NEXT: %[[SVL_B_I64:.*]] = arith.index_castui %[[SVL_B_1]] : index to i64
	// CHECK-NEXT: %[[TILE_SLICE_IDX:.*]] = arith.muli %[[TILE_SLICE_I64]], %[[SVL_B_I64]] : i64			// CHECK-NEXT: %[[TILE_SLICE_IDX:.*]] = arith.muli %[[TILE_SLICE_I64]], %[[SVL_B_I64]] : i64
	// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>			// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
	// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[TILE_SLICE_IDX]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8			// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[TILE_SLICE_IDX]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
	// CHECK-NEXT: %[[TILE_SLICE_I32:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i32			// CHECK-NEXT: %[[TILE_SLICE_I32:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i32
	// CHECK-NEXT: %[[TILE_ID_I32:.*]] = arith.extui %[[TILE_ID]] : i8 to i32			// CHECK-NEXT: %[[TILE_ID_I32:.*]] = arith.extui %[[TILE_ID]] : i8 to i32
	// CHECK-NEXT: "arm_sme.intr.ld1b.horiz"(%[[PTRUE_ALL]], %[[GEP]], %[[TILE_ID_I32]], %[[TILE_SLICE_I32]]) : (vector<[16]xi1>, !llvm.ptr, i32, i32) -> ()			// CHECK-NEXT: "arm_sme.intr.ld1b.horiz"(%[[PTRUE_ALL]], %[[GEP]], %[[TILE_ID_I32]], %[[TILE_SLICE_I32]]) : (vector<[16]xi1>, !llvm.ptr, i32, i32) -> ()
	// CHECK-NEXT: }			// CHECK-NEXT: }
	// CHECK-NEXT: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i8 to vector<[16]x[16]xi8>
	// CHECK-NEXT: return %[[CAST_TILE_TO_VECTOR]] : vector<[16]x[16]xi8>			// CHECK-NEXT: return %[[CAST_TILE_TO_VECTOR]] : vector<[16]x[16]xi8>
	func.func @vector_load_i8_from_rank_1_memref(%arg0 : memref<?xi8>) -> vector<[16]x[16]xi8> {			func.func @vector_load_i8_from_rank_1_memref(%arg0 : memref<?xi8>) -> vector<[16]x[16]xi8> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%tile = vector.load %arg0[%c0] : memref<?xi8>, vector<[16]x[16]xi8>			%tile = vector.load %arg0[%c0] : memref<?xi8>, vector<[16]x[16]xi8>
	return %tile : vector<[16]x[16]xi8>			return %tile : vector<[16]x[16]xi8>
	}			}


	// -----			// -----

	// CHECK-LABEL: @vector_load_i16(			// CHECK-LABEL: @vector_load_i16(
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi16>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi16>)
	// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i16			// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i16
				// CHECK-DAG: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i16 to vector<[8]x[8]xi16>
	// CHECK-DAG: %[[MIN_SVL_H:.*]] = arith.constant 8 : index			// CHECK-DAG: %[[MIN_SVL_H:.*]] = arith.constant 8 : index
	// CHECK: %[[SVL_H:.]] = arith.muli %{{.}}, %[[MIN_SVL_H]] : index			// CHECK: %[[SVL_H:.]] = arith.muli %{{.}}, %[[MIN_SVL_H]] : index
	// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[TILE_ID]] : i16 to i32			// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[TILE_ID]] : i16 to i32
	// CHECK: arm_sme.intr.ld1h.horiz			// CHECK: arm_sme.intr.ld1h.horiz
	// CHECK: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i16 to vector<[8]x[8]xi16>
	func.func @vector_load_i16(%arg0 : memref<?x?xi16>) -> vector<[8]x[8]xi16> {			func.func @vector_load_i16(%arg0 : memref<?x?xi16>) -> vector<[8]x[8]xi16> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%tile = vector.load %arg0[%c0, %c0] : memref<?x?xi16>, vector<[8]x[8]xi16>			%tile = vector.load %arg0[%c0, %c0] : memref<?x?xi16>, vector<[8]x[8]xi16>
	return %tile : vector<[8]x[8]xi16>			return %tile : vector<[8]x[8]xi16>
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_load_i32(			// CHECK-LABEL: @vector_load_i32(
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi32>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi32>)
	// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i32			// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i32
				// CHECK-DAG: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i32 to vector<[4]x[4]xi32>
	// CHECK-DAG: %[[MIN_SVL_S:.*]] = arith.constant 4 : index			// CHECK-DAG: %[[MIN_SVL_S:.*]] = arith.constant 4 : index
	// CHECK: %[[SVL_S:.]] = arith.muli %{{.}}, %[[MIN_SVL_S]] : index			// CHECK: %[[SVL_S:.]] = arith.muli %{{.}}, %[[MIN_SVL_S]] : index
	// CHECK-NOT: arith.extui %[[TILE_ID]]			// CHECK-NOT: arith.extui %[[TILE_ID]]
	// CHECK-NOT: arith.trunci %[[TILE_ID]]			// CHECK-NOT: arith.trunci %[[TILE_ID]]
	// CHECK: arm_sme.intr.ld1w.horiz			// CHECK: arm_sme.intr.ld1w.horiz
	// CHECK: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i32 to vector<[4]x[4]xi32>
	func.func @vector_load_i32(%arg0 : memref<?x?xi32>) -> vector<[4]x[4]xi32> {			func.func @vector_load_i32(%arg0 : memref<?x?xi32>) -> vector<[4]x[4]xi32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%tile = vector.load %arg0[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>			%tile = vector.load %arg0[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
	return %tile : vector<[4]x[4]xi32>			return %tile : vector<[4]x[4]xi32>
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_load_i64(			// CHECK-LABEL: @vector_load_i64(
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi64>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi64>)
	// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i64			// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i64
				// CHECK-DAG: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i64 to vector<[2]x[2]xi64>
	// CHECK-DAG: %[[MIN_SVL_D:.*]] = arith.constant 2 : index			// CHECK-DAG: %[[MIN_SVL_D:.*]] = arith.constant 2 : index
	// CHECK: %[[SVL_D:.]] = arith.muli %{{.}}, %[[MIN_SVL_D]] : index			// CHECK: %[[SVL_D:.]] = arith.muli %{{.}}, %[[MIN_SVL_D]] : index
	// CHECK: %[[TILE_ID_I32:.*]] = arith.trunci %[[TILE_ID]] : i64 to i32			// CHECK: %[[TILE_ID_I32:.*]] = arith.trunci %[[TILE_ID]] : i64 to i32
	// CHECK: arm_sme.intr.ld1d.horiz			// CHECK: arm_sme.intr.ld1d.horiz
	// CHECK: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i64 to vector<[2]x[2]xi64>
	func.func @vector_load_i64(%arg0 : memref<?x?xi64>) -> vector<[2]x[2]xi64> {			func.func @vector_load_i64(%arg0 : memref<?x?xi64>) -> vector<[2]x[2]xi64> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%tile = vector.load %arg0[%c0, %c0] : memref<?x?xi64>, vector<[2]x[2]xi64>			%tile = vector.load %arg0[%c0, %c0] : memref<?x?xi64>, vector<[2]x[2]xi64>
	return %tile : vector<[2]x[2]xi64>			return %tile : vector<[2]x[2]xi64>
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_load_f16(			// CHECK-LABEL: @vector_load_f16(
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf16>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf16>)
	// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i16			// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i16
				// CHECK-DAG: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i16 to vector<[8]x[8]xf16>
	// CHECK-DAG: %[[MIN_SVL_H:.*]] = arith.constant 8 : index			// CHECK-DAG: %[[MIN_SVL_H:.*]] = arith.constant 8 : index
	// CHECK: %[[SVL_H:.]] = arith.muli %{{.}}, %[[MIN_SVL_H]] : index			// CHECK: %[[SVL_H:.]] = arith.muli %{{.}}, %[[MIN_SVL_H]] : index
	// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[TILE_ID]] : i16 to i32			// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[TILE_ID]] : i16 to i32
	// CHECK: arm_sme.intr.ld1h.horiz			// CHECK: arm_sme.intr.ld1h.horiz
	// CHECK: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i16 to vector<[8]x[8]xf16>
	func.func @vector_load_f16(%arg0 : memref<?x?xf16>) -> vector<[8]x[8]xf16> {			func.func @vector_load_f16(%arg0 : memref<?x?xf16>) -> vector<[8]x[8]xf16> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%tile = vector.load %arg0[%c0, %c0] : memref<?x?xf16>, vector<[8]x[8]xf16>			%tile = vector.load %arg0[%c0, %c0] : memref<?x?xf16>, vector<[8]x[8]xf16>
	return %tile : vector<[8]x[8]xf16>			return %tile : vector<[8]x[8]xf16>
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_load_bf16(			// CHECK-LABEL: @vector_load_bf16(
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xbf16>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xbf16>)
	// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i16			// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i16
				// CHECK-DAG: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i16 to vector<[8]x[8]xbf16>
	// CHECK-DAG: %[[MIN_SVL_H:.*]] = arith.constant 8 : index			// CHECK-DAG: %[[MIN_SVL_H:.*]] = arith.constant 8 : index
	// CHECK: %[[SVL_H:.]] = arith.muli %{{.}}, %[[MIN_SVL_H]] : index			// CHECK: %[[SVL_H:.]] = arith.muli %{{.}}, %[[MIN_SVL_H]] : index
	// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[TILE_ID]] : i16 to i32			// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[TILE_ID]] : i16 to i32
	// CHECK: arm_sme.intr.ld1h.horiz			// CHECK: arm_sme.intr.ld1h.horiz
	// CHECK: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i16 to vector<[8]x[8]xbf16>
	func.func @vector_load_bf16(%arg0 : memref<?x?xbf16>) -> vector<[8]x[8]xbf16> {			func.func @vector_load_bf16(%arg0 : memref<?x?xbf16>) -> vector<[8]x[8]xbf16> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%tile = vector.load %arg0[%c0, %c0] : memref<?x?xbf16>, vector<[8]x[8]xbf16>			%tile = vector.load %arg0[%c0, %c0] : memref<?x?xbf16>, vector<[8]x[8]xbf16>
	return %tile : vector<[8]x[8]xbf16>			return %tile : vector<[8]x[8]xbf16>
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_load_f32(			// CHECK-LABEL: @vector_load_f32(
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf32>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf32>)
	// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i32			// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i32
				// CHECK-DAG: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i32 to vector<[4]x[4]xf32>
	// CHECK-DAG: %[[MIN_SVL_S:.*]] = arith.constant 4 : index			// CHECK-DAG: %[[MIN_SVL_S:.*]] = arith.constant 4 : index
	// CHECK: %[[SVL_S:.]] = arith.muli %{{.}}, %[[MIN_SVL_S]] : index			// CHECK: %[[SVL_S:.]] = arith.muli %{{.}}, %[[MIN_SVL_S]] : index
	// CHECK-NOT: arith.extui %[[TILE_ID]]			// CHECK-NOT: arith.extui %[[TILE_ID]]
	// CHECK-NOT: arith.trunci %[[TILE_ID]]			// CHECK-NOT: arith.trunci %[[TILE_ID]]
	// CHECK: arm_sme.intr.ld1w.horiz			// CHECK: arm_sme.intr.ld1w.horiz
	// CHECK: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i32 to vector<[4]x[4]xf32>
	func.func @vector_load_f32(%arg0 : memref<?x?xf32>) -> vector<[4]x[4]xf32> {			func.func @vector_load_f32(%arg0 : memref<?x?xf32>) -> vector<[4]x[4]xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%tile = vector.load %arg0[%c0, %c0] : memref<?x?xf32>, vector<[4]x[4]xf32>			%tile = vector.load %arg0[%c0, %c0] : memref<?x?xf32>, vector<[4]x[4]xf32>
	return %tile : vector<[4]x[4]xf32>			return %tile : vector<[4]x[4]xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_load_f64(			// CHECK-LABEL: @vector_load_f64(
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf64>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf64>)
	// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i64			// CHECK-DAG: %[[TILE_ID:.*]] = arm_sme.get_tile_id : i64
				// CHECK-DAG: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i64 to vector<[2]x[2]xf64>
	// CHECK-DAG: %[[MIN_SVL_D:.*]] = arith.constant 2 : index			// CHECK-DAG: %[[MIN_SVL_D:.*]] = arith.constant 2 : index
	// CHECK: %[[SVL_D:.]] = arith.muli %{{.}}, %[[MIN_SVL_D]] : index			// CHECK: %[[SVL_D:.]] = arith.muli %{{.}}, %[[MIN_SVL_D]] : index
	// CHECK: %[[TILE_ID_I32:.*]] = arith.trunci %[[TILE_ID]] : i64 to i32			// CHECK: %[[TILE_ID_I32:.*]] = arith.trunci %[[TILE_ID]] : i64 to i32
	// CHECK: arm_sme.intr.ld1d.horiz			// CHECK: arm_sme.intr.ld1d.horiz
	// CHECK: %[[CAST_TILE_TO_VECTOR:.*]] = arm_sme.cast_tile_to_vector %[[TILE_ID]] : i64 to vector<[2]x[2]xf64>
	func.func @vector_load_f64(%arg0 : memref<?x?xf64>) -> vector<[2]x[2]xf64> {			func.func @vector_load_f64(%arg0 : memref<?x?xf64>) -> vector<[2]x[2]xf64> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%tile = vector.load %arg0[%c0, %c0] : memref<?x?xf64>, vector<[2]x[2]xf64>			%tile = vector.load %arg0[%c0, %c0] : memref<?x?xf64>, vector<[2]x[2]xf64>
	return %tile : vector<[2]x[2]xf64>			return %tile : vector<[2]x[2]xf64>
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_store_i8(			// CHECK-LABEL: @vector_store_i8(
	// CHECK-SAME: %[[TILE:.*]]: vector<[16]x[16]xi8>,			// CHECK-SAME: %[[TILE:.*]]: vector<[16]x[16]xi8>,
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi8>)
	// CHECK-DAG: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>			// CHECK-DAG: %[[MEM_DESC:.*]] = builtin.unrealized_conversion_cast %[[ARG0]] : memref<?x?xi8> to !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-DAG: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[16]x[16]xi8> to i8
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index			// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
	// CHECK-DAG: %[[MIN_SVL_B:.*]] = arith.constant 16 : index			// CHECK-DAG: %[[MIN_SVL_B:.*]] = arith.constant 16 : index
	// CHECK-DAG: %[[PTRUE_ALL:.*]] = arith.constant dense<true> : vector<[16]xi1>			// CHECK-DAG: %[[PTRUE_ALL:.*]] = arith.constant dense<true> : vector<[16]xi1>
	// CHECK-DAG: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64			// CHECK-DAG: %[[VSCALE:.*]] = "llvm.intr.vscale"() : () -> i64
	// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index			// CHECK-NEXT: %[[VSCALE_IDX:.*]] = builtin.unrealized_conversion_cast %[[VSCALE]] : i64 to index
	// CHECK-NEXT: %[[SVL_B:.*]] = arith.muli %[[VSCALE_IDX]], %[[MIN_SVL_B]] : index			// CHECK-NEXT: %[[SVL_B:.*]] = arith.muli %[[VSCALE_IDX]], %[[MIN_SVL_B]] : index
	// CHECK-NEXT: scf.for %[[TILE_SLICE:.*]] = %[[C0]] to %[[SVL_B]] step %[[C1]] {			// CHECK-NEXT: scf.for %[[TILE_SLICE:.*]] = %[[C0]] to %[[SVL_B]] step %[[C1]] {
	// CHECK-NEXT: %[[TILE_SLICE_I64:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i64			// CHECK-NEXT: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[16]x[16]xi8> to i8
				// CHECK: %[[TILE_SLICE_I64:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i64
	// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>			// CHECK-NEXT: %[[ALIGNED_BASE:.*]] = llvm.extractvalue %[[MEM_DESC]][1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-NEXT: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>			// CHECK-NEXT: %[[STRIDE0:.*]] = llvm.extractvalue %[[MEM_DESC]][4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
	// CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]] : i64			// CHECK-NEXT: %[[OFF0:.*]] = llvm.mul %[[TILE_SLICE_I64]], %[[STRIDE0]] : i64
	// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8			// CHECK-NEXT: %[[GEP:.*]] = llvm.getelementptr %[[ALIGNED_BASE]]{{\[}}%[[OFF0]]] : (!llvm.ptr, i64) -> !llvm.ptr, i8
	// CHECK-NEXT: %[[TILE_SLICE_I32:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i32			// CHECK-NEXT: %[[TILE_SLICE_I32:.*]] = arith.index_castui %[[TILE_SLICE]] : index to i32
	// CHECK-NEXT: %[[TILE_ID_I32:.*]] = arith.extui %[[CAST_VECTOR_TO_TILE]] : i8 to i32			// CHECK-NEXT: %[[TILE_ID_I32:.*]] = arith.extui %[[CAST_VECTOR_TO_TILE]] : i8 to i32
	// CHECK-NEXT: "arm_sme.intr.st1b.horiz"(%[[PTRUE_ALL]], %[[GEP]], %[[TILE_ID_I32]], %[[TILE_SLICE_I32]]) : (vector<[16]xi1>, !llvm.ptr, i32, i32) -> ()			// CHECK-NEXT: "arm_sme.intr.st1b.horiz"(%[[PTRUE_ALL]], %[[GEP]], %[[TILE_ID_I32]], %[[TILE_SLICE_I32]]) : (vector<[16]xi1>, !llvm.ptr, i32, i32) -> ()
	// CHECK-NEXT: }			// CHECK-NEXT: }
	// CHECK-NEXT: return			// CHECK-NEXT: return
	func.func @vector_store_i8(%tile : vector<[16]x[16]xi8>, %arg0 : memref<?x?xi8>) {			func.func @vector_store_i8(%tile : vector<[16]x[16]xi8>, %arg0 : memref<?x?xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.store %tile, %arg0[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>			vector.store %tile, %arg0[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_store_i16(			// CHECK-LABEL: @vector_store_i16(
	// CHECK-SAME: %[[TILE:.*]]: vector<[8]x[8]xi16>,			// CHECK-SAME: %[[TILE:.*]]: vector<[8]x[8]xi16>,
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi16>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi16>)
	// CHECK-DAG: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[8]x[8]xi16> to i16			// CHECK: %[[MIN_SVL_H:.*]] = arith.constant 8 : index
	// CHECK-DAG: %[[MIN_SVL_H:.*]] = arith.constant 8 : index
	// CHECK: %[[SVL_H:.]] = arith.muli %{{.}}, %[[MIN_SVL_H]] : index			// CHECK: %[[SVL_H:.]] = arith.muli %{{.}}, %[[MIN_SVL_H]] : index
				// CHECK: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[8]x[8]xi16> to i16
	// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[CAST_VECTOR_TO_TILE]] : i16 to i32			// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[CAST_VECTOR_TO_TILE]] : i16 to i32
	// CHECK: arm_sme.intr.st1h.horiz			// CHECK: arm_sme.intr.st1h.horiz
	func.func @vector_store_i16(%tile : vector<[8]x[8]xi16>, %arg0 : memref<?x?xi16>) {			func.func @vector_store_i16(%tile : vector<[8]x[8]xi16>, %arg0 : memref<?x?xi16>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.store %tile, %arg0[%c0, %c0] : memref<?x?xi16>, vector<[8]x[8]xi16>			vector.store %tile, %arg0[%c0, %c0] : memref<?x?xi16>, vector<[8]x[8]xi16>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_store_i32(			// CHECK-LABEL: @vector_store_i32(
	// CHECK-SAME: %[[TILE:.*]]: vector<[4]x[4]xi32>,			// CHECK-SAME: %[[TILE:.*]]: vector<[4]x[4]xi32>,
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi32>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi32>)
	// CHECK-DAG: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[4]x[4]xi32> to i32			// CHECK: %[[MIN_SVL_S:.*]] = arith.constant 4 : index
	// CHECK-DAG: %[[MIN_SVL_S:.*]] = arith.constant 4 : index
	// CHECK: %[[SVL_S:.]] = arith.muli %{{.}}, %[[MIN_SVL_S]] : index			// CHECK: %[[SVL_S:.]] = arith.muli %{{.}}, %[[MIN_SVL_S]] : index
				// CHECK: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[4]x[4]xi32> to i32
	// CHECK-NOT: arith.extui %[[CAST_VECTOR_TO_TILE]]			// CHECK-NOT: arith.extui %[[CAST_VECTOR_TO_TILE]]
	// CHECK-NOT: arith.trunci %[[CAST_VECTOR_TO_TILE]]			// CHECK-NOT: arith.trunci %[[CAST_VECTOR_TO_TILE]]
	// CHECK: arm_sme.intr.st1w.horiz			// CHECK: arm_sme.intr.st1w.horiz
	func.func @vector_store_i32(%tile : vector<[4]x[4]xi32>, %arg0 : memref<?x?xi32>) {			func.func @vector_store_i32(%tile : vector<[4]x[4]xi32>, %arg0 : memref<?x?xi32>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.store %tile, %arg0[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>			vector.store %tile, %arg0[%c0, %c0] : memref<?x?xi32>, vector<[4]x[4]xi32>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_store_i64(			// CHECK-LABEL: @vector_store_i64(
	// CHECK-SAME: %[[TILE:.*]]: vector<[2]x[2]xi64>,			// CHECK-SAME: %[[TILE:.*]]: vector<[2]x[2]xi64>,
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi64>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xi64>)
	// CHECK-DAG: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[2]x[2]xi64> to i64			// CHECK: %[[MIN_SVL_D:.*]] = arith.constant 2 : index
	// CHECK-DAG: %[[MIN_SVL_D:.*]] = arith.constant 2 : index
	// CHECK: %[[SVL_D:.]] = arith.muli %{{.}}, %[[MIN_SVL_D]] : index			// CHECK: %[[SVL_D:.]] = arith.muli %{{.}}, %[[MIN_SVL_D]] : index
				// CHECK: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[2]x[2]xi64> to i64
	// CHECK: %[[TILE_ID_I32:.*]] = arith.trunci %[[CAST_VECTOR_TO_TILE]] : i64 to i32			// CHECK: %[[TILE_ID_I32:.*]] = arith.trunci %[[CAST_VECTOR_TO_TILE]] : i64 to i32
	// CHECK: arm_sme.intr.st1d.horiz			// CHECK: arm_sme.intr.st1d.horiz
	func.func @vector_store_i64(%tile : vector<[2]x[2]xi64>, %arg0 : memref<?x?xi64>) {			func.func @vector_store_i64(%tile : vector<[2]x[2]xi64>, %arg0 : memref<?x?xi64>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.store %tile, %arg0[%c0, %c0] : memref<?x?xi64>, vector<[2]x[2]xi64>			vector.store %tile, %arg0[%c0, %c0] : memref<?x?xi64>, vector<[2]x[2]xi64>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_store_f16(			// CHECK-LABEL: @vector_store_f16(
	// CHECK-SAME: %[[TILE:.*]]: vector<[8]x[8]xf16>,			// CHECK-SAME: %[[TILE:.*]]: vector<[8]x[8]xf16>,
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf16>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf16>)
	// CHECK-DAG: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[8]x[8]xf16> to i16			// CHECK: %[[MIN_SVL_H:.*]] = arith.constant 8 : index
	// CHECK-DAG: %[[MIN_SVL_H:.*]] = arith.constant 8 : index
	// CHECK: %[[SVL_H:.]] = arith.muli %{{.}}, %[[MIN_SVL_H]] : index			// CHECK: %[[SVL_H:.]] = arith.muli %{{.}}, %[[MIN_SVL_H]] : index
				// CHECK: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[8]x[8]xf16> to i16
	// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[CAST_VECTOR_TO_TILE]] : i16 to i32			// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[CAST_VECTOR_TO_TILE]] : i16 to i32
	// CHECK: arm_sme.intr.st1h.horiz			// CHECK: arm_sme.intr.st1h.horiz
	func.func @vector_store_f16(%tile : vector<[8]x[8]xf16>, %arg0 : memref<?x?xf16>) {			func.func @vector_store_f16(%tile : vector<[8]x[8]xf16>, %arg0 : memref<?x?xf16>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.store %tile, %arg0[%c0, %c0] : memref<?x?xf16>, vector<[8]x[8]xf16>			vector.store %tile, %arg0[%c0, %c0] : memref<?x?xf16>, vector<[8]x[8]xf16>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_store_bf16(			// CHECK-LABEL: @vector_store_bf16(
	// CHECK-SAME: %[[TILE:.*]]: vector<[8]x[8]xbf16>,			// CHECK-SAME: %[[TILE:.*]]: vector<[8]x[8]xbf16>,
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xbf16>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xbf16>)
	// CHECK-DAG: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[8]x[8]xbf16> to i16			// CHECK: %[[MIN_SVL_H:.*]] = arith.constant 8 : index
	// CHECK-DAG: %[[MIN_SVL_H:.*]] = arith.constant 8 : index
	// CHECK: %[[SVL_H:.]] = arith.muli %{{.}}, %[[MIN_SVL_H]] : index			// CHECK: %[[SVL_H:.]] = arith.muli %{{.}}, %[[MIN_SVL_H]] : index
				// CHECK: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[8]x[8]xbf16> to i16
	// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[CAST_VECTOR_TO_TILE]] : i16 to i32			// CHECK: %[[TILE_ID_I32:.*]] = arith.extui %[[CAST_VECTOR_TO_TILE]] : i16 to i32
	// CHECK: arm_sme.intr.st1h.horiz			// CHECK: arm_sme.intr.st1h.horiz
	func.func @vector_store_bf16(%tile : vector<[8]x[8]xbf16>, %arg0 : memref<?x?xbf16>) {			func.func @vector_store_bf16(%tile : vector<[8]x[8]xbf16>, %arg0 : memref<?x?xbf16>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.store %tile, %arg0[%c0, %c0] : memref<?x?xbf16>, vector<[8]x[8]xbf16>			vector.store %tile, %arg0[%c0, %c0] : memref<?x?xbf16>, vector<[8]x[8]xbf16>
	return			return
	}			}
	// -----			// -----

	// CHECK-LABEL: @vector_store_f32(			// CHECK-LABEL: @vector_store_f32(
	// CHECK-SAME: %[[TILE:.*]]: vector<[4]x[4]xf32>,			// CHECK-SAME: %[[TILE:.*]]: vector<[4]x[4]xf32>,
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf32>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf32>)
	// CHECK-DAG: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[4]x[4]xf32> to i32			// CHECK: %[[MIN_SVL_S:.*]] = arith.constant 4 : index
	// CHECK-DAG: %[[MIN_SVL_S:.*]] = arith.constant 4 : index
	// CHECK: %[[SVL_S:.]] = arith.muli %{{.}}, %[[MIN_SVL_S]] : index			// CHECK: %[[SVL_S:.]] = arith.muli %{{.}}, %[[MIN_SVL_S]] : index
				// CHECK: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[4]x[4]xf32> to i32
	// CHECK-NOT: arith.extui %[[CAST_VECTOR_TO_TILE]]			// CHECK-NOT: arith.extui %[[CAST_VECTOR_TO_TILE]]
	// CHECK-NOT: arith.trunci %[[CAST_VECTOR_TO_TILE]]			// CHECK-NOT: arith.trunci %[[CAST_VECTOR_TO_TILE]]
	// CHECK: arm_sme.intr.st1w.horiz			// CHECK: arm_sme.intr.st1w.horiz
	func.func @vector_store_f32(%tile : vector<[4]x[4]xf32>, %arg0 : memref<?x?xf32>) {			func.func @vector_store_f32(%tile : vector<[4]x[4]xf32>, %arg0 : memref<?x?xf32>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.store %tile, %arg0[%c0, %c0] : memref<?x?xf32>, vector<[4]x[4]xf32>			vector.store %tile, %arg0[%c0, %c0] : memref<?x?xf32>, vector<[4]x[4]xf32>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: @vector_store_f64(			// CHECK-LABEL: @vector_store_f64(
	// CHECK-SAME: %[[TILE:.*]]: vector<[2]x[2]xf64>,			// CHECK-SAME: %[[TILE:.*]]: vector<[2]x[2]xf64>,
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf64>)			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?xf64>)
	// CHECK-DAG: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[2]x[2]xf64> to i64			// CHECK: %[[MIN_SVL_D:.*]] = arith.constant 2 : index
	// CHECK-DAG: %[[MIN_SVL_D:.*]] = arith.constant 2 : index
	// CHECK: %[[SVL_D:.]] = arith.muli %{{.}}, %[[MIN_SVL_D]] : index			// CHECK: %[[SVL_D:.]] = arith.muli %{{.}}, %[[MIN_SVL_D]] : index
				// CHECK: %[[CAST_VECTOR_TO_TILE:.*]] = arm_sme.cast_vector_to_tile %[[TILE]] : vector<[2]x[2]xf64> to i64
	// CHECK: %[[TILE_ID_I32:.*]] = arith.trunci %[[CAST_VECTOR_TO_TILE]] : i64 to i32			// CHECK: %[[TILE_ID_I32:.*]] = arith.trunci %[[CAST_VECTOR_TO_TILE]] : i64 to i32
	// CHECK: arm_sme.intr.st1d.horiz			// CHECK: arm_sme.intr.st1d.horiz
	func.func @vector_store_f64(%tile : vector<[2]x[2]xf64>, %arg0 : memref<?x?xf64>) {			func.func @vector_store_f64(%tile : vector<[2]x[2]xf64>, %arg0 : memref<?x?xf64>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.store %tile, %arg0[%c0, %c0] : memref<?x?xf64>, vector<[2]x[2]xf64>			vector.store %tile, %arg0[%c0, %c0] : memref<?x?xf64>, vector<[2]x[2]xf64>
	return			return
	}			}

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir

	// RUN: mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \			// RUN: mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \
	// RUN: -convert-vector-to-arm-sme -convert-vector-to-llvm="enable-arm-sme" \			// RUN: -convert-vector-to-arm-sme -convert-arm-sme-to-scf \
				// RUN: -convert-vector-to-llvm="enable-arm-sme" -cse -canonicalize \
	// RUN: -allocate-arm-sme-tiles -test-lower-to-llvm \| \			// RUN: -allocate-arm-sme-tiles -test-lower-to-llvm \| \
	// RUN: mlir-translate -mlir-to-llvmir \| \			// RUN: mlir-translate -mlir-to-llvmir \| \
	// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \			// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \
	// RUN: --entry-function=za0_d_f64 \			// RUN: --entry-function=za0_d_f64 \
	// RUN: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \| \			// RUN: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \| \
	// RUN: FileCheck %s --check-prefix=CHECK-ZA0_D			// RUN: FileCheck %s --check-prefix=CHECK-ZA0_D

	// Integration test demonstrating load/store to/from SME ZA tile.			// Integration test demonstrating load/store to/from SME ZA tile.
	▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir

	// RUN: mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \			// RUN: mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \
	// RUN: -convert-vector-to-arm-sme -convert-vector-to-llvm="enable-arm-sme" \			// RUN: -convert-vector-to-arm-sme -convert-arm-sme-to-scf \
				// RUN: -convert-vector-to-llvm="enable-arm-sme" \
	// RUN: -allocate-arm-sme-tiles -test-lower-to-llvm \| \			// RUN: -allocate-arm-sme-tiles -test-lower-to-llvm \| \
	// RUN: mlir-translate -mlir-to-llvmir \| \			// RUN: mlir-translate -mlir-to-llvmir \| \
	// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \			// RUN: %lli_aarch64_cmd --march=aarch64 --mattr="+sve,+sme" \
	// RUN: --entry-function=entry \			// RUN: --entry-function=entry \
	// RUN: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \| \			// RUN: --dlopen=%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @entry() -> i32 {			func.func @entry() -> i32 {
	▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ArmSME] Add conversion from ArmSME to SCF to materialize loopsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 545958

mlir/include/mlir/Conversion/ArmSMEToSCF/ArmSMEToSCF.h

mlir/include/mlir/Conversion/Passes.h

mlir/include/mlir/Conversion/Passes.td

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td

mlir/lib/Conversion/ArmSMEToSCF/ArmSMEToSCF.cpp

mlir/lib/Conversion/ArmSMEToSCF/CMakeLists.txt

mlir/lib/Conversion/CMakeLists.txt

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp

mlir/test/Conversion/ArmSMEToSCF/arm-sme-to-scf.mlir

mlir/test/Dialect/ArmSME/arm-sme-to-llvm-casts.mlir

mlir/test/Dialect/ArmSME/roundtrip.mlir

mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-load-store.mlir

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/vector-ops.mlir

[mlir][ArmSME] Add conversion from ArmSME to SCF to materialize loops
ClosedPublic