This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/ArmSME/IR/
-
mlir/
-
Dialect/
-
ArmSME/
-
IR/
3/12
ArmSME.td
-
lib/
-
Conversion/VectorToArmSME/
-
VectorToArmSME/
2/4
VectorToArmSME.cpp
-
Dialect/ArmSME/Transforms/
-
ArmSME/
-
Transforms/
2/4
LegalizeForLLVMExport.cpp
-
test/
-
Dialect/ArmSME/
-
ArmSME/
1/2
arith-ops-to-sme.mlir
-
invalid.mlir
2/2
roundtrip.mlir
-
Integration/Dialect/Vector/CPU/ArmSME/
-
Dialect/
-
Vector/
-
CPU/
-
ArmSME/
3/3
tile_fill.mlir

Differential D157005

[mlir][ArmSME] Add move vector to tile slice op and lowerings
ClosedPublic

Authored by c-rhodes on Aug 3 2023, 7:33 AM.

Download Raw Diff

Details

Reviewers

awarzynski
benmxwl-arm
aartbik
ftynse
dcaballe
nicolasvasilache

Commits

rG3b4b6cbba5e0: [mlir][ArmSME] Add move vector to tile slice op and lowerings

Summary

This adds a 'move_vector_to_tile_slice' op to the ArmSME dialect that
moves a 1-D scalable vector to a slice of a 2-D tile at a given index.

This is lowered to the 'llvm.aarch64.sme.write.horiz' intrinsic that
maps to the MOVA (vector to tile, single) SME instruction [1] when
lowering to LLVM. Like the SME load and store instructions this operates
on ZA tile slices, which are 1D vectors of horizontally or vertically
contiguous elements within a ZA tile.

This patch extends the lowering of 'arith.constant' to SME to support
non-zero constants using this new op. This requires materializing a
loop that broadcasts the constant to each tile slice with the
'vector_to_tile_slice' op. Unlike load and store, this is done during
conversion from Vector to ArmSME, rather than ArmSME to SCF. The latter
would require a higher-level custom op in the ArmSME dialect like
'tile_load' and 'tile_store' and this isn't necessary. We may also
remove the load and store ops in the future in favour of lowering
straight from Vector, at which point this would converge.

Currently only horizontal tile slices are supported. A future patch will
extend this mechanism to support 'vector.broadcast'.

Depends on D156980 D157004

[1] https://developer.arm.com/documentation/ddi0602

Diff Detail

Event Timeline

c-rhodes created this revision.Aug 3 2023, 7:33 AM

Herald added a reviewer: aartbik. · View Herald TranscriptAug 3 2023, 7:33 AM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: gysit, Dinistro, bviyer and 26 others. · View Herald Transcript

c-rhodes requested review of this revision.Aug 3 2023, 7:33 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptAug 3 2023, 7:33 AM

Herald added subscribers: alextsao1999, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B250064: Diff 546856.Aug 3 2023, 7:34 AM

c-rhodes edited the summary of this revision. (Show Details)Aug 3 2023, 7:35 AM

c-rhodes added a parent revision: D156980: [mlir][ArmSME] Extend arm_sme.zero for all types.

LGTM in general, thanks!

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
424	-> element type?
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
332–337	We are usually creating masks with `vector.create_mask` and `vector.constant_mask` but maybe it's already too late to introduce them here. I would expect them to have been lowered in an earlier stage. Just bringing this up in case you thought about it. (I'm not asking for changes)

This revision is now accepted and ready to land.Aug 8 2023, 2:47 PM

A few minor points/questions. LG otherwise

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
417	Most Ops' names convey what the corresponding "action"/"function" is. Wondering whether this shouldn't be `MoveVectorToTileSliceOp` instead. Naming is hard.
mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
106	[nit] Just to make the split into 2 cases more visible.
112	[nit] Just to make the split into 2 cases more visible.
137	IIUC, the following block will only create the loop structure. The following operation is not created here: loads each ZA tile slice from memory. That's created further down. Also, rather than "loading from memory", this is creating "move from vector to a slice", right?
mlir/test/Dialect/ArmSME/roundtrip.mlir
586	Could you add some invalid cases in "invalid.mlir"?
mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir
251 ↗	(On Diff #546856)	It would be good to check at least one more element type.
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/tile_fill.mlir
55	[nit] I would use some other value - 1 is super common and can be easily missed. Here it would be nice to emphasise that it could be _anything_.

Herald added a subscriber: sunshaoce. · View Herald TranscriptAug 21 2023, 1:11 AM

benmxwl-arm added inline comments.Aug 21 2023, 10:10 AM

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/tile_fill.mlir
75	It's landed :)

c-rhodes added inline comments.Aug 22 2023, 1:08 AM

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
417	Most Ops' names convey what the corresponding "action"/"function" is. Wondering whether this shouldn't be `MoveVectorToTileSliceOp` instead. Naming is hard. are you suggesting to rename it both internally and externally (i.e. `move_vector_to_tile_slice`) or just the former?
424	-> element type? For a 2-d scalable vector such as `vector<[4]x[4]xi32>` is the element type `vector<[4]xi32>`? That's what I want to express here but I do struggle with these descriptions. FWIW `getElementType()` returns the scalar `i32`. Perhaps `The 1-D vector type must match the vector type of the inner dimension of the 2-D vector type`?
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/tile_fill.mlir
75	It's landed :) Thanks for heads up I'll update this

Address most comments.
Add type constraint to VectorToTileSliceOp that verifies 1-D vector type matches element type of 2-D vector type (tile).
Rename VectorToTileSliceOp::getVectorType() -> VectorToTileSliceOp::getTileType() to prevent confusion with vector operand.

c-rhodes marked 5 inline comments as done.Aug 22 2023, 1:35 AM

c-rhodes added inline comments.

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
137	IIUC, the following block will only create the loop structure. The following operation is not created here: loads each ZA tile slice from memory. That's created further down. Also, rather than "loading from memory", this is creating "move from vector to a slice", right?
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
332–337	We are usually creating masks with `vector.create_mask` and `vector.constant_mask` but maybe it's already too late to introduce them here. I would expect them to have been lowered in an earlier stage. Just bringing this up in case you thought about it. (I'm not asking for changes) I hadn't thought about that, thanks for mentioning. I can't see why it wouldn't work since a vector op (splat) is already added here, but I'm not really sure it simplifies this.
mlir/test/Dialect/ArmSME/roundtrip.mlir
586	Could you add some invalid cases in "invalid.mlir"? Done, was also missing a type constraint that verifies 1-d vector type matches inner vector type of 2-d vector type.

awarzynski added inline comments.Aug 22 2023, 2:03 AM

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
417	I was thinking both.
424	I also though that you meant the element type here, as in e.g. `i32` or `f32`. I think that we need to be more explicit here: The type of the 1-d scalable vector to be moved must match the type of the tile slice (note that 1 slice is effectively 1 row in a virtual tile). WDYT?

Harbormaster completed remote builds in B254025: Diff 552261.Aug 22 2023, 2:21 AM

Update descriptions for op and type constraint.

c-rhodes added inline comments.Aug 22 2023, 6:50 AM

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
417	I was thinking both. Still considering, it's more descriptive but also it's quite a long name
424	I also though that you meant the element type here, as in e.g. `i32` or `f32`. I think that we need to be more explicit here: The type of the 1-d scalable vector to be moved must match the type of the tile slice (note that 1 slice is effectively 1 row in a virtual tile). WDYT? I've updated the description along these lines, hope it makes more sense now, appreciate the input!

Harbormaster completed remote builds in B254080: Diff 552337.Aug 22 2023, 7:09 AM

c-rhodes added a child revision: D158586: [mlir][ArmSME] Lower vector.broadcast to ArmSME.Aug 23 2023, 12:55 AM

Rename vector_to_tile_slice to move_vector_to_tile_slice

c-rhodes marked an inline comment as done.Aug 24 2023, 2:28 AM

Harbormaster completed remote builds in B254580: Diff 553048.Aug 24 2023, 3:14 AM

dcaballe added inline comments.Aug 27 2023, 9:45 PM

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
424	I would call that inner 1-D sub-vector?
430	1-d -> 1-D for consistency?
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
332–337	Well, if this is expected to run before some of these "higher" level vector ops are lowered, I would make sure we generate those. We have patterns that are looking specifically for those and are able to understand if they are all-true/all-false masks, etc. We don't have mask patterns looking at constant ops.

LGTM, thanks!

Following on from what Diego suggest re vector.mask, we probably should look into using those instead of vector.splat. At some point soon, not in this patch :)

c-rhodes added inline comments.Aug 29 2023, 1:47 AM

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
424	I would call that inner 1-D sub-vector? Hm, it's not an inner 1-D sub-vector until it's moved, I think that would make sense for the inverse of this operation (extract) that does tile slice to vector.
430	1-d -> 1-D for consistency? Ah good spot will fix this before landing, cheers
mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
332–337	Well, if this is expected to run before some of these "higher" level vector ops are lowered, I would make sure we generate those. We have patterns that are looking specifically for those and are able to understand if they are all-true/all-false masks, etc. We don't have mask patterns looking at constant ops. Sorry I'm not sure I follow, this is low-level and emitting the canonical form for an all active mask (a constant?), it's not clear to me what the benefit of such patterns are in this case? I should mention there has been no consideration of masking so far in the ops introduced for our first target for SME of lowering linalg.fill, now that there is a basic path established (D158619) I've starting looking into masking.

Closed by commit rG3b4b6cbba5e0: [mlir][ArmSME] Add move vector to tile slice op and lowerings (authored by c-rhodes). · Explain WhyAug 29 2023, 2:38 AM

This revision was automatically updated to reflect the committed changes.

c-rhodes added a commit: rG3b4b6cbba5e0: [mlir][ArmSME] Add move vector to tile slice op and lowerings.

nicolasvasilache added inline comments.Aug 29 2023, 7:26 AM

mlir/test/Dialect/ArmSME/arith-ops-to-sme.mlir
107	side note, can you just return the value instead of using this fake op or is there something more fundamental that does not let us return a scalable vector here ?

c-rhodes added inline comments.Aug 29 2023, 9:09 AM

mlir/test/Dialect/ArmSME/arith-ops-to-sme.mlir
107	side note, can you just return the value instead of using this fake op or is there something more fundamental that does not let us return a scalable vector here ? For the purposes of this test the scalable vector could be returned, but generally we can't support passing or returning 2-d scalable vectors to/from functions since these types can't be lowered to LLVM and even if they could it's not defined by the ABI. For this reason I opted for the fake use op so as to not set a precedent that this is something that can be done. I believe there are some earlier ArmSME tests where 2-d scalable vector are returned however, we should probably update them for consistency.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

ArmSME/

IR/

ArmSME.td

43 lines

lib/

Conversion/

VectorToArmSME/

VectorToArmSME.cpp

60 lines

Dialect/

ArmSME/

Transforms/

LegalizeForLLVMExport.cpp

62 lines

test/

Dialect/

ArmSME/

arith-ops-to-sme.mlir

44 lines

invalid.mlir

18 lines

roundtrip.mlir

81 lines

Integration/

Dialect/

Vector/

CPU/

ArmSME/

tile_fill.mlir

78 lines

Diff 552261

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td

Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	def StoreTileSliceOp : ArmSME_Op<"store_tile_slice"> {
}];		}];

let assemblyFormat = [{		let assemblyFormat = [{
$tile `,` $tile_slice_index `,` $base `[` $indices `]`		$tile `,` $tile_slice_index `,` $base `[` $indices `]`
attr-dict `:` type($base) `,` type($tile)		attr-dict `:` type($base) `,` type($tile)
}];		}];
}		}

		def VectorToTileSliceOp : ArmSME_Op<"vector_to_tile_slice", [
		awarzynskiUnsubmitted Not Done Reply Inline Actions Most Ops' names convey what the corresponding "action"/"function" is. Wondering whether this shouldn't be `MoveVectorToTileSliceOp` instead. Naming is hard. awarzynski: Most Ops' names convey what the corresponding "action"/"function" is. Wondering whether this…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions Most Ops' names convey what the corresponding "action"/"function" is. Wondering whether this shouldn't be `MoveVectorToTileSliceOp` instead. Naming is hard. are you suggesting to rename it both internally and externally (i.e. `move_vector_to_tile_slice`) or just the former? c-rhodes: > Most Ops' names convey what the corresponding "action"/"function" is. Wondering whether this…
		awarzynskiUnsubmitted Done Reply Inline Actions I was thinking both. awarzynski: I was thinking both.
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions I was thinking both. Still considering, it's more descriptive but also it's quite a long name c-rhodes: > I was thinking both. Still considering, it's more descriptive but also it's quite a long name
		AllTypesMatch<["tile", "result"]>,
		TypesMatchWith<
		"type of 'vector' matches element type of 'tile'",
		"tile", "vector",
		"VectorType::get("
		"::llvm::cast<mlir::VectorType>($_self).getShape().drop_front(),"
		"::llvm::cast<mlir::VectorType>($_self).getElementType(),"
		dcaballeUnsubmitted Not Done Reply Inline Actions -> element type? dcaballe: -> element type?
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions -> element type? For a 2-d scalable vector such as `vector<[4]x[4]xi32>` is the element type `vector<[4]xi32>`? That's what I want to express here but I do struggle with these descriptions. FWIW `getElementType()` returns the scalar `i32`. Perhaps `The 1-D vector type must match the vector type of the inner dimension of the 2-D vector type`? c-rhodes: > -> element type? For a 2-d scalable vector such as `vector<[4]x[4]xi32>` is the element type…
		awarzynskiUnsubmitted Not Done Reply Inline Actions I also though that you meant the element type here, as in e.g. `i32` or `f32`. I think that we need to be more explicit here: The type of the 1-d scalable vector to be moved must match the type of the tile slice (note that 1 slice is effectively 1 row in a virtual tile). WDYT? awarzynski: I also though that you meant the element type here, as in e.g. `i32` or `f32`. I think that we…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions I also though that you meant the element type here, as in e.g. `i32` or `f32`. I think that we need to be more explicit here: The type of the 1-d scalable vector to be moved must match the type of the tile slice (note that 1 slice is effectively 1 row in a virtual tile). WDYT? I've updated the description along these lines, hope it makes more sense now, appreciate the input! c-rhodes: > I also though that you meant the element type here, as in e.g. `i32` or `f32`. > > I think…
		dcaballeUnsubmitted Not Done Reply Inline Actions I would call that inner 1-D sub-vector? dcaballe: I would call that inner 1-D sub-vector?
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions I would call that inner 1-D sub-vector? Hm, it's not an inner 1-D sub-vector until it's moved, I think that would make sense for the inverse of this operation (extract) that does tile slice to vector. c-rhodes: > I would call that inner 1-D sub-vector? Hm, it's not an inner 1-D sub-vector until it's…
		"/scalableDims=/{true})">,
		]> {
		let summary = "Move 1-D scalable vector to slice of 2-D tile";
		let description = [{
		The vector to tile slice operation moves a 1-D scalable vector to a tile
		(2-D scalable vector) slice at the given index. The 1-D vector type must
		dcaballeUnsubmitted Not Done Reply Inline Actions 1-d -> 1-D for consistency? dcaballe: 1-d -> 1-D for consistency?
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions 1-d -> 1-D for consistency? Ah good spot will fix this before landing, cheers c-rhodes: > 1-d -> 1-D for consistency? Ah good spot will fix this before landing, cheers
		match the inner vector type of the 2-D vector type that represents the
		tile. The updated tile is returned as the result.

		Example 1: Move a vector<[16]xi8> into tile at given index.
		```mlir
		%tile_update = arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[16]xi8> into vector<[16]x[16]xi8>
		```

		Example 2: Move a vector<[2]xf64> into tile at given index.
		```mlir
		%tile_update = arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[2]xf64> into vector<[2]x[2]xf64>
		```
		}];
		let arguments = (ins
		SVEVector:$vector, SMETile:$tile, Index:$tile_slice_index);
		let results = (outs SMETile:$result);

		let extraClassDeclaration = [{
		VectorType getTileType() {
		return ::llvm::cast<VectorType>(getTile().getType());
		}
		}];

		let assemblyFormat = [{
		$vector `,` $tile `,` $tile_slice_index
		attr-dict `:` type($vector) `into` type($result)
		}];
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ArmSME Intrinsic op definitions		// ArmSME Intrinsic op definitions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def MOPPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2], [I1]>;		def MOPPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2], [I1]>;
def MOPVector : ScalableVectorOfLengthAndType<[16, 8, 4, 2],		def MOPVector : ScalableVectorOfLengthAndType<[16, 8, 4, 2],
[I8, I16, BF16, F16, F32, F64]>;		[I8, I16, BF16, F16, F32, F64]>;
def LDSTPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2, 1], [I1]>;		def LDSTPredicate : ScalableVectorOfLengthAndType<[16, 8, 4, 2, 1], [I1]>;
▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines

};

/// Conversion pattern for dense arith.constant.

struct ConstantOpToArmSMELowering : public OpRewritePattern<arith::ConstantOp> {

using OpRewritePattern<arith::ConstantOp>::OpRewritePattern;

LogicalResult matchAndRewrite(arith::ConstantOp constantOp,

PatternRewriter &rewriter) const final {

auto vType = dyn_cast<VectorType>(constantOp.getType());

auto tileType = dyn_cast<VectorType>(constantOp.getType());

if (!vType || !arm_sme::isValidSMETileVectorType(vType))

if (!tileType || !arm_sme::isValidSMETileVectorType(tileType))

return failure();

auto denseAttr = dyn_cast<DenseElementsAttr>(constantOp.getValueAttr());

if (!denseAttr || !isSplatZero(vType.getElementType(), denseAttr))

if (!denseAttr || !denseAttr.isSplat())

return failure();

rewriter.replaceOpWithNewOp<arm_sme::ZeroOp>(constantOp, vType);

auto tileElementType = tileType.getElementType();

// Lower 'arith.constant dense<0>' to 'arm_sme.zero' op.

awarzynskiUnsubmitted

Not Done

auto tileElementType = tileType.getElementType();

- // Lower 'arith.constant dense<0>' to 'arm_sme.zero' op.

+ // CASE 1: Lower 'arith.constant dense<0>' to 'arm_sme.zero' op.

if (isSplatZero(tileElementType, denseAttr)) {

[nit] Just to make the split into 2 cases more visible.

awarzynski: [nit] Just to make the split into 2 cases more visible.

if (isSplatZero(tileElementType, denseAttr)) {

rewriter.replaceOpWithNewOp<arm_sme::ZeroOp>(constantOp, tileType);

return success();

}

// Lower non-zero constants to a loop of 'arm_sme.vector_to_tile_slice' ops

awarzynskiUnsubmitted

Not Done

return success();

}

- // Lower non-zero constants to a loop of 'arm_sme.vector_to_tile_slice' ops

+ // CASE 2: Lower non-zero constants to a loop of 'arm_sme.vector_to_tile_slice' ops

// that broadcast the constant to each tile slice.

[nit] Just to make the split into 2 cases more visible.

awarzynski: [nit] Just to make the split into 2 cases more visible.

// that broadcast the constant to each tile slice.

OpBuilder::InsertionGuard g(rewriter);

auto loc = constantOp.getLoc();

// Unpack 1-d vector type from 2-d vector type.

auto tileSliceType =

VectorType::get(tileType.getShape().drop_front(), tileElementType,

/*scalableDims=*/{true});

auto denseAttr1D = DenseElementsAttr::get(

tileSliceType, denseAttr.getSplatValue<Attribute>());

auto constantOp1D = rewriter.create<arith::ConstantOp>(loc, denseAttr1D);

unsigned tileElementWidth = tileElementType.getIntOrFloatBitWidth();

// Create 'arm_sme.get_tile' op.

auto tileId = rewriter.create<arm_sme::GetTileID>(

loc, rewriter.getIntegerType(tileElementWidth));

// Create `arm_sme.cast_tile_to_vector` to cast tile ID to a vector type to

// use as input tile to 'arm_sme.vector_to_tile_slice' ops.

auto tile =

rewriter.create<arm_sme::CastTileToVector>(loc, tileType, tileId);

auto step = rewriter.create<arith::ConstantIndexOp>(loc, 1);

auto minTileSlices = rewriter.create<arith::ConstantIndexOp>(

awarzynskiUnsubmitted

Done

IIUC, the following block will only create the loop structure. The following operation is not created here:

loads each ZA tile slice from memory.

That's created further down. Also, rather than "loading from memory", this is creating "move from vector to a slice", right?

awarzynski: IIUC, the following block will only create the loop structure. The following operation is not…

c-rhodesAuthorUnsubmitted

Done

IIUC, the following block will only create the loop structure. The following operation is not created here:

loads each ZA tile slice from memory.

That's created further down. Also, rather than "loading from memory", this is creating "move from vector to a slice", right?

c-rhodes: > IIUC, the following block will only create the loop structure. The following operation is not…

loc, arm_sme::getSMETileSliceMinNumElts(tileElementType));

auto vscale =

rewriter.create<vector::VectorScaleOp>(loc, rewriter.getIndexType());

auto lowerBound = rewriter.create<arith::ConstantIndexOp>(loc, 0);

auto numTileSlices =

rewriter.create<arith::MulIOp>(loc, minTileSlices, vscale);

// Create a loop that broadcasts the constant to each ZA tile slice.

auto forOp =

rewriter.create<scf::ForOp>(loc, lowerBound, numTileSlices, step);

rewriter.setInsertionPointToStart(forOp.getBody());

auto tileSliceIndex = forOp.getInductionVar();

// Create 'arm_sme.vector_to_tile_slice' to write vector to tile slice.

rewriter.create<arm_sme::VectorToTileSliceOp>(loc, tileType, constantOp1D,

tile, tileSliceIndex);

rewriter.setInsertionPointAfter(forOp);

rewriter.replaceOp(constantOp, tile);

return success();

}

};

} // namespace

void mlir::populateVectorToArmSMEPatterns(RewritePatternSet &patterns,

MLIRContext &ctx) {

patterns.add<TransferWriteToArmSMELowering, VectorLoadToArmSMELowering,

VectorStoreToArmSMELowering, ConstantOpToArmSMELowering>(&ctx);

}

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp

Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	case 64:
storeTileSliceOp, allActiveMask, ptr, tileI32, tileSliceI32);		storeTileSliceOp, allActiveMask, ptr, tileI32, tileSliceI32);
break;		break;
}		}

return success();		return success();
}		}
};		};

		/// Lower `arm_sme.vector_to_tile_slice` to SME intrinsics. Only horizontal
		/// tile slices are currently supported.
		struct VectorToTileSliceToArmSMELowering
		: public ConvertOpToLLVMPattern<arm_sme::VectorToTileSliceOp> {
		using ConvertOpToLLVMPattern<
		arm_sme::VectorToTileSliceOp>::ConvertOpToLLVMPattern;

		LogicalResult
		matchAndRewrite(arm_sme::VectorToTileSliceOp vectorToTileSliceOp,
		arm_sme::VectorToTileSliceOp::Adaptor adaptor,
		ConversionPatternRewriter &rewriter) const override {
		auto loc = vectorToTileSliceOp.getLoc();
		auto tileType = vectorToTileSliceOp.getTileType();
		auto tileElementType = tileType.getElementType();
		unsigned tileElementWidth = tileElementType.getIntOrFloatBitWidth();

		// Create 'arm_sme.cast_vector_to_tile' to get a tile ID for the tile being
		// loaded to.
		auto tile = rewriter.create<arm_sme::CastVectorToTile>(
		loc, rewriter.getIntegerType(tileElementWidth),
		vectorToTileSliceOp.getTile());

		auto tileSlice = vectorToTileSliceOp.getTileSliceIndex();

		// Cast tile slice from index to i32 for intrinsic.
		auto tileSliceI32 = rewriter.create<arith::IndexCastUIOp>(
		loc, rewriter.getI32Type(), tileSlice);

		// Create all active predicate mask.
		auto one = rewriter.create<arith::ConstantOp>(
		loc, rewriter.getI1Type(),
		rewriter.getIntegerAttr(rewriter.getI1Type(), 1));
		auto predTy = VectorType::get(tileType.getShape()[0], rewriter.getI1Type(),
		/scalableDims=/{true});
		auto allActiveMask = rewriter.create<vector::SplatOp>(loc, predTy, one);
		dcaballeUnsubmitted Not Done Reply Inline Actions We are usually creating masks with `vector.create_mask` and `vector.constant_mask` but maybe it's already too late to introduce them here. I would expect them to have been lowered in an earlier stage. Just bringing this up in case you thought about it. (I'm not asking for changes) dcaballe: We are usually creating masks with `vector.create_mask` and `vector.constant_mask` but maybe…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions We are usually creating masks with `vector.create_mask` and `vector.constant_mask` but maybe it's already too late to introduce them here. I would expect them to have been lowered in an earlier stage. Just bringing this up in case you thought about it. (I'm not asking for changes) I hadn't thought about that, thanks for mentioning. I can't see why it wouldn't work since a vector op (splat) is already added here, but I'm not really sure it simplifies this. c-rhodes: > We are usually creating masks with `vector.create_mask` and `vector.constant_mask` but maybe…
		dcaballeUnsubmitted Not Done Reply Inline Actions Well, if this is expected to run before some of these "higher" level vector ops are lowered, I would make sure we generate those. We have patterns that are looking specifically for those and are able to understand if they are all-true/all-false masks, etc. We don't have mask patterns looking at constant ops. dcaballe: Well, if this is expected to run before some of these "higher" level vector ops are lowered, I…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions Well, if this is expected to run before some of these "higher" level vector ops are lowered, I would make sure we generate those. We have patterns that are looking specifically for those and are able to understand if they are all-true/all-false masks, etc. We don't have mask patterns looking at constant ops. Sorry I'm not sure I follow, this is low-level and emitting the canonical form for an all active mask (a constant?), it's not clear to me what the benefit of such patterns are in this case? I should mention there has been no consideration of masking so far in the ops introduced for our first target for SME of lowering linalg.fill, now that there is a basic path established (D158619) I've starting looking into masking. c-rhodes: > Well, if this is expected to run before some of these "higher" level vector ops are lowered…

		auto tileI32 = castTileIDToI32(tile, loc, rewriter);

		// Create 'arm_sme.intr.write.horiz' to write vector to tile slice.
		rewriter.create<arm_sme::aarch64_sme_write_horiz>(
		loc, tileI32, tileSliceI32, allActiveMask,
		vectorToTileSliceOp.getVector());

		// Intrinsic has no result, replace 'arm_sme.vector_to_tile_slice' with
		// 'arm_sme.cast_tile_to_vector' to preserve dataflow.
		rewriter.replaceOpWithNewOp<arm_sme::CastTileToVector>(vectorToTileSliceOp,
		tileType, tile);

		return success();
		}
		};

} // namespace		} // namespace

void mlir::configureArmSMELegalizeForExportTarget(		void mlir::configureArmSMELegalizeForExportTarget(
LLVMConversionTarget &target) {		LLVMConversionTarget &target) {
target.addLegalOp<		target.addLegalOp<
scf::ForOp, scf::YieldOp, arm_sme::CastTileToVector,		scf::ForOp, scf::YieldOp, arm_sme::CastTileToVector,
arm_sme::CastVectorToTile, arm_sme::aarch64_sme_zero,		arm_sme::CastVectorToTile, arm_sme::aarch64_sme_zero,
arm_sme::aarch64_sme_str, arm_sme::aarch64_sme_ld1b_horiz,		arm_sme::aarch64_sme_str, arm_sme::aarch64_sme_ld1b_horiz,
arm_sme::aarch64_sme_ld1h_horiz, arm_sme::aarch64_sme_ld1w_horiz,		arm_sme::aarch64_sme_ld1h_horiz, arm_sme::aarch64_sme_ld1w_horiz,
arm_sme::aarch64_sme_ld1d_horiz, arm_sme::aarch64_sme_st1b_horiz,		arm_sme::aarch64_sme_ld1d_horiz, arm_sme::aarch64_sme_st1b_horiz,
arm_sme::aarch64_sme_st1h_horiz, arm_sme::aarch64_sme_st1w_horiz,		arm_sme::aarch64_sme_st1h_horiz, arm_sme::aarch64_sme_st1w_horiz,
arm_sme::aarch64_sme_st1d_horiz, arm_sme::aarch64_sme_za_enable,		arm_sme::aarch64_sme_st1d_horiz, arm_sme::aarch64_sme_write_horiz,
arm_sme::aarch64_sme_za_disable>();		arm_sme::aarch64_sme_za_enable, arm_sme::aarch64_sme_za_disable>();
target.addLegalOp<GetTileID>();		target.addLegalOp<GetTileID>();

// Mark 'func.func' ops as legal if either:		// Mark 'func.func' ops as legal if either:
// 1. no 'arm_za' function attribute is present.		// 1. no 'arm_za' function attribute is present.
// 2. the 'arm_za' function attribute is present and the first op in the		// 2. the 'arm_za' function attribute is present and the first op in the
// function is an 'arm_sme::aarch64_sme_za_enable' intrinsic.		// function is an 'arm_sme::aarch64_sme_za_enable' intrinsic.
target.addDynamicallyLegalOp<func::FuncOp>([&](func::FuncOp funcOp) {		target.addDynamicallyLegalOp<func::FuncOp>([&](func::FuncOp funcOp) {
if (funcOp.isDeclaration())		if (funcOp.isDeclaration())
Show All 14 Lines	funcOp->walk<WalkOrder::PreOrder>(
[&](arm_sme::aarch64_sme_za_disable op) { hasDisableZA = true; });		[&](arm_sme::aarch64_sme_za_disable op) { hasDisableZA = true; });
return !funcOp->hasAttr("arm_za") \|\| hasDisableZA;		return !funcOp->hasAttr("arm_za") \|\| hasDisableZA;
});		});
}		}

void mlir::populateArmSMELegalizeForLLVMExportPatterns(		void mlir::populateArmSMELegalizeForLLVMExportPatterns(
LLVMTypeConverter &converter, RewritePatternSet &patterns) {		LLVMTypeConverter &converter, RewritePatternSet &patterns) {
patterns.add<EnableZAPattern, DisableZAPattern>(patterns.getContext());		patterns.add<EnableZAPattern, DisableZAPattern>(patterns.getContext());
patterns.add<ZeroOpConversion, StoreTileSliceToArmSMELowering,		patterns
LoadTileSliceToArmSMELowering>(converter);		.add<ZeroOpConversion, StoreTileSliceToArmSMELowering,
		LoadTileSliceToArmSMELowering, VectorToTileSliceToArmSMELowering>(
		converter);
}		}

mlir/test/Dialect/ArmSME/arith-ops-to-sme.mlir

	Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines

	// CHECK-LABEL: @arith_constant_dense_2d_zero_f64			// CHECK-LABEL: @arith_constant_dense_2d_zero_f64
	// CHECK: %[[ZERO:.*]] = arm_sme.zero : vector<[2]x[2]xf64>			// CHECK: %[[ZERO:.*]] = arm_sme.zero : vector<[2]x[2]xf64>
	func.func @arith_constant_dense_2d_zero_f64() {			func.func @arith_constant_dense_2d_zero_f64() {
	%zero = arith.constant dense<0.0> : vector<[2]x[2]xf64>			%zero = arith.constant dense<0.0> : vector<[2]x[2]xf64>
	"prevent.dce"(%zero) : (vector<[2]x[2]xf64>) -> ()			"prevent.dce"(%zero) : (vector<[2]x[2]xf64>) -> ()
	return			return
	}			}

				// =============================================================================
				// Non-zero arith.constant dense to SME
				// =============================================================================

				// -----

				// CHECK-LABEL: func.func @arith_constant_dense_2d_nonzero_i8() {
				// CHECK: %[[C2_SPLAT:.*]] = arith.constant dense<2> : vector<[16]xi8>
				// CHECK: %[[C1:.*]] = arith.constant 1 : index
				// CHECK: %[[C16:.*]] = arith.constant 16 : index
				// CHECK: %[[C0:.*]] = arith.constant 0 : index
				// CHECK: %[[GET_TILE_ID:.*]] = arm_sme.get_tile_id : i8
				// CHECK: %[[TILE:.*]] = arm_sme.cast_tile_to_vector %[[GET_TILE_ID]] : i8 to vector<[16]x[16]xi8>
				// CHECK: %[[VSCALE:.*]] = vector.vscale
				// CHECK: %[[NUM_TILE_SLICES:.*]] = arith.muli %[[VSCALE]], %[[C16]] : index
				// CHECK: scf.for %[[TILE_SLICE_INDEX:.*]] = %[[C0]] to %[[NUM_TILE_SLICES]] step %[[C1]] {
				// CHECK: arm_sme.vector_to_tile_slice %[[C2_SPLAT]], %[[TILE]], %[[TILE_SLICE_INDEX]] : vector<[16]xi8> into vector<[16]x[16]xi8>
				// CHECK: "prevent.dce"(%[[TILE]]) : (vector<[16]x[16]xi8>) -> ()
				func.func @arith_constant_dense_2d_nonzero_i8() {
				%two = arith.constant dense<2> : vector<[16]x[16]xi8>
				"prevent.dce"(%two) : (vector<[16]x[16]xi8>) -> ()
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions side note, can you just return the value instead of using this fake op or is there something more fundamental that does not let us return a scalable vector here ? nicolasvasilache: side note, can you just return the value instead of using this fake op or is there something…
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions side note, can you just return the value instead of using this fake op or is there something more fundamental that does not let us return a scalable vector here ? For the purposes of this test the scalable vector could be returned, but generally we can't support passing or returning 2-d scalable vectors to/from functions since these types can't be lowered to LLVM and even if they could it's not defined by the ABI. For this reason I opted for the fake use op so as to not set a precedent that this is something that can be done. I believe there are some earlier ArmSME tests where 2-d scalable vector are returned however, we should probably update them for consistency. c-rhodes: > side note, can you just return the value instead of using this fake op or is there something…
				return
				}

				// -----

				// CHECK-LABEL: func.func @arith_constant_dense_2d_nonzero_f64() {
				// CHECK: %[[C2_SPLAT:.*]] = arith.constant dense<2.000000e+00> : vector<[2]xf64>
				// CHECK: %[[C1:.*]] = arith.constant 1 : index
				// CHECK: %[[C2:.*]] = arith.constant 2 : index
				// CHECK: %[[C0:.*]] = arith.constant 0 : index
				// CHECK: %[[GET_TILE_ID:.*]] = arm_sme.get_tile_id : i64
				// CHECK: %[[TILE:.*]] = arm_sme.cast_tile_to_vector %[[GET_TILE_ID]] : i64 to vector<[2]x[2]xf64>
				// CHECK: %[[VSCALE:.*]] = vector.vscale
				// CHECK: %[[NUM_TILE_SLICES:.*]] = arith.muli %[[VSCALE]], %[[C2]] : index
				// CHECK: scf.for %[[TILE_SLICE_INDEX:.*]] = %[[C0]] to %[[NUM_TILE_SLICES]] step %[[C1]] {
				// CHECK: arm_sme.vector_to_tile_slice %[[C2_SPLAT]], %[[TILE]], %[[TILE_SLICE_INDEX]] : vector<[2]xf64> into vector<[2]x[2]xf64>
				// CHECK: "prevent.dce"(%[[TILE]]) : (vector<[2]x[2]xf64>) -> ()
				func.func @arith_constant_dense_2d_nonzero_f64() {
				%two = arith.constant dense<2.0> : vector<[2]x[2]xf64>
				"prevent.dce"(%two) : (vector<[2]x[2]xf64>) -> ()
				return
				}

mlir/test/Dialect/ArmSME/invalid.mlir

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines

	// -----			// -----

	func.func @arm_sme_get_tile_id__bad_type() -> i1 {			func.func @arm_sme_get_tile_id__bad_type() -> i1 {
	// expected-error@+1 {{op result #0 must be 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 128-bit signless integer}}			// expected-error@+1 {{op result #0 must be 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 128-bit signless integer}}
	%0 = arm_sme.get_tile_id : i1			%0 = arm_sme.get_tile_id : i1
	return %0 : i1			return %0 : i1
	}			}

				// -----

				func.func @arm_sme_vector_to_tile_slice_i8__bad_vector_type(%vector : vector<[8]xi8>, %tile : vector<[16]x[16]xi8>, %tile_slice_index : index) -> vector<[16]x[16]xi8> {
				%c0 = arith.constant 0 : index
				// expected-error@+1 {{op failed to verify that type of 'vector' matches element type of 'tile'}}
				%0 = arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[8]xi8> into vector<[16]x[16]xi8>
				return %0 : vector<[16]x[16]xi8>
				}

				// -----

				func.func @arm_sme_vector_to_tile_slice_f32__bad_vector_type(%vector : vector<[8]xf32>, %tile : vector<[4]x[4]xf32>, %tile_slice_index : index) -> vector<[4]x[4]xf32> {
				%c0 = arith.constant 0 : index
				// expected-error@+1 {{op failed to verify that type of 'vector' matches element type of 'tile'}}
				%0 = arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[8]xf32> into vector<[4]x[4]xf32>
				return %0 : vector<[4]x[4]xf32>
				}

mlir/test/Dialect/ArmSME/roundtrip.mlir

	Show First 20 Lines • Show All 571 Lines • ▼ Show 20 Lines
	// -----			// -----

	func.func @arm_sme_store_tile_slice_f64(%tile : vector<[2]x[2]xf64>, %tile_slice_index : index, %dest : memref<?x?xf64>) -> () {			func.func @arm_sme_store_tile_slice_f64(%tile : vector<[2]x[2]xf64>, %tile_slice_index : index, %dest : memref<?x?xf64>) -> () {
	// CHECK: arm_sme.store_tile_slice {{.*}} : memref<?x?xf64>, vector<[2]x[2]xf64>			// CHECK: arm_sme.store_tile_slice {{.*}} : memref<?x?xf64>, vector<[2]x[2]xf64>
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	arm_sme.store_tile_slice %tile, %tile_slice_index, %dest[%c0] : memref<?x?xf64>, vector<[2]x[2]xf64>			arm_sme.store_tile_slice %tile, %tile_slice_index, %dest[%c0] : memref<?x?xf64>, vector<[2]x[2]xf64>
	return			return
	}			}

				// -----

				func.func @arm_sme_vector_to_tile_slice_i8(%vector : vector<[16]xi8>, %tile : vector<[16]x[16]xi8>, %tile_slice_index : index) -> () {
				// CHECK: arm_sme.vector_to_tile_slice {{.*}} : vector<[16]xi8> into vector<[16]x[16]xi8>
				%c0 = arith.constant 0 : index
				arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[16]xi8> into vector<[16]x[16]xi8>
				awarzynskiUnsubmitted Done Reply Inline Actions Could you add some invalid cases in "invalid.mlir"? awarzynski: Could you add some invalid cases in "invalid.mlir"?
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions Could you add some invalid cases in "invalid.mlir"? Done, was also missing a type constraint that verifies 1-d vector type matches inner vector type of 2-d vector type. c-rhodes: > Could you add some invalid cases in "invalid.mlir"? Done, was also missing a type constraint…
				return
				}

				// -----

				func.func @arm_sme_vector_to_tile_slice_i16(%vector : vector<[8]xi16>, %tile : vector<[8]x[8]xi16>, %tile_slice_index : index) -> () {
				// CHECK: arm_sme.vector_to_tile_slice {{.*}} : vector<[8]xi16> into vector<[8]x[8]xi16>
				%c0 = arith.constant 0 : index
				arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[8]xi16> into vector<[8]x[8]xi16>
				return
				}

				// -----

				func.func @arm_sme_vector_to_tile_slice_i32(%vector : vector<[4]xi32>, %tile : vector<[4]x[4]xi32>, %tile_slice_index : index) -> () {
				// CHECK: arm_sme.vector_to_tile_slice {{.*}} : vector<[4]xi32> into vector<[4]x[4]xi32>
				%c0 = arith.constant 0 : index
				arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[4]xi32> into vector<[4]x[4]xi32>
				return
				}

				// -----

				func.func @arm_sme_vector_to_tile_slice_i64(%vector : vector<[2]xi64>, %tile : vector<[2]x[2]xi64>, %tile_slice_index : index) -> () {
				// CHECK: arm_sme.vector_to_tile_slice {{.*}} : vector<[2]xi64> into vector<[2]x[2]xi64>
				%c0 = arith.constant 0 : index
				arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[2]xi64> into vector<[2]x[2]xi64>
				return
				}

				// -----

				func.func @arm_sme_vector_to_tile_slice_i128(%vector : vector<[1]xi128>, %tile : vector<[1]x[1]xi128>, %tile_slice_index : index) -> () {
				// CHECK: arm_sme.vector_to_tile_slice {{.*}} : vector<[1]xi128> into vector<[1]x[1]xi128>
				%c0 = arith.constant 0 : index
				arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[1]xi128> into vector<[1]x[1]xi128>
				return
				}

				// -----

				func.func @arm_sme_vector_to_tile_slice_f16(%vector : vector<[8]xf16>, %tile : vector<[8]x[8]xf16>, %tile_slice_index : index) -> () {
				// CHECK: arm_sme.vector_to_tile_slice {{.*}} : vector<[8]xf16> into vector<[8]x[8]xf16>
				%c0 = arith.constant 0 : index
				arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[8]xf16> into vector<[8]x[8]xf16>
				return
				}

				// -----

				func.func @arm_sme_vector_to_tile_slice_bf16(%vector : vector<[8]xbf16>, %tile : vector<[8]x[8]xbf16>, %tile_slice_index : index) -> () {
				// CHECK: arm_sme.vector_to_tile_slice {{.*}} : vector<[8]xbf16> into vector<[8]x[8]xbf16>
				%c0 = arith.constant 0 : index
				arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[8]xbf16> into vector<[8]x[8]xbf16>
				return
				}

				// -----

				func.func @arm_sme_vector_to_tile_slice_f32(%vector : vector<[4]xf32>, %tile : vector<[4]x[4]xf32>, %tile_slice_index : index) -> () {
				// CHECK: arm_sme.vector_to_tile_slice {{.*}} : vector<[4]xf32> into vector<[4]x[4]xf32>
				%c0 = arith.constant 0 : index
				arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[4]xf32> into vector<[4]x[4]xf32>
				return
				}

				// -----

				func.func @arm_sme_vector_to_tile_slice_f64(%vector : vector<[2]xf64>, %tile : vector<[2]x[2]xf64>, %tile_slice_index : index) -> () {
				// CHECK: arm_sme.vector_to_tile_slice {{.*}} : vector<[2]xf64> into vector<[2]x[2]xf64>
				%c0 = arith.constant 0 : index
				arm_sme.vector_to_tile_slice %vector, %tile, %tile_slice_index : vector<[2]xf64> into vector<[2]x[2]xf64>
				return
				}

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/tile_fill.mlir

This file was added.

				// RUN: mlir-opt %s -enable-arm-streaming="mode=locally enable-za" \
				// RUN: -convert-vector-to-arm-sme -convert-arm-sme-to-scf \
				// RUN: -convert-vector-to-llvm="enable-arm-sme" -cse -canonicalize \
				// RUN: -allocate-arm-sme-tiles -test-lower-to-llvm \| \
				// RUN: %mcr_aarch64_cmd \
				// RUN: -march=aarch64 -mattr=+sve,+sme \
				// RUN: -e entry -entry-point-result=i32 \
				// RUN: -shared-libs=%mlir_runner_utils,%mlir_c_runner_utils \| \
				// RUN: FileCheck %s

				// Integration test demonstrating filling a 32-bit element ZA tile with a
				// non-zero constant via vector to tile (MOVA) ops.

				llvm.func @printCString(!llvm.ptr<i8>)

				func.func @printTileBegin() {
				%0 = llvm.mlir.addressof @str_tile_begin : !llvm.ptr<array<11 x i8>>
				%1 = llvm.mlir.constant(0 : index) : i64
				%2 = llvm.getelementptr %0[%1, %1]
				: (!llvm.ptr<array<11 x i8>>, i64, i64) -> !llvm.ptr<i8>
				llvm.call @printCString(%2) : (!llvm.ptr<i8>) -> ()
				return
				}

				func.func @printTileEnd() {
				%0 = llvm.mlir.addressof @str_tile_end : !llvm.ptr<array<9 x i8>>
				%1 = llvm.mlir.constant(0 : index) : i64
				%2 = llvm.getelementptr %0[%1, %1]
				: (!llvm.ptr<array<9 x i8>>, i64, i64) -> !llvm.ptr<i8>
				llvm.call @printCString(%2) : (!llvm.ptr<i8>) -> ()
				return
				}

				func.func @entry() -> i32 {
				%c0 = arith.constant 0 : index
				%c1_index = arith.constant 1 : index

				%min_elts_s = arith.constant 4 : index
				%vscale = vector.vscale

				// "svl" refers to the Streaming Vector Length and "svl_s" the number of
				// 32-bit elements in a vector of SVL bits.
				%svl_s = arith.muli %min_elts_s, %vscale : index

				// Allocate memory.
				%tilesize = arith.muli %svl_s, %svl_s : index
				%mem = memref.alloca(%tilesize) : memref<?xi32>

				// Fill a tile with '123'. This will get lowered to a 1-d vector splat of
				// '123' and a loop that writes this vector to each tile slice in the ZA
				// tile.
				%tile = arith.constant dense<123> : vector<[4]x[4]xi32>

				// Store tile to memory so it can be dumped.
				vector.store %tile, %mem[%c0] : memref<?xi32>, vector<[4]x[4]xi32>
				awarzynskiUnsubmitted Done Reply Inline Actions [nit] I would use some other value - 1 is super common and can be easily missed. Here it would be nice to emphasise that it could be _anything_. awarzynski: [nit] I would use some other value - 1 is super common and can be easily missed. Here it would…

				// Dump "mem". The smallest SVL is 128-bits so the tile will be at least
				// 4x4xi32.
				//
				// CHECK: TILE BEGIN
				// CHECK-NEXT: ( 123, 123, 123, 123
				// CHECK-NEXT: ( 123, 123, 123, 123
				// CHECK-NEXT: ( 123, 123, 123, 123
				// CHECK-NEXT: ( 123, 123, 123, 123
				// CHECK: TILE END
				func.call @printTileBegin() : () -> ()
				scf.for %i = %c0 to %tilesize step %svl_s {
				%tileslice = vector.load %mem[%i] : memref<?xi32>, vector<[4]xi32>
				vector.print %tileslice : vector<[4]xi32>
				}
				func.call @printTileEnd() : () -> ()

				%c0_i32 = arith.constant 0 : i32
				return %c0_i32 : i32
				}
				benmxwl-armUnsubmitted Done Reply Inline Actions It's landed :) benmxwl-arm: It's landed :)
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions It's landed :) Thanks for heads up I'll update this c-rhodes: > It's landed :) > Thanks for heads up I'll update this

				llvm.mlir.global internal constant @str_tile_begin("TILE BEGIN\0A")
				llvm.mlir.global internal constant @str_tile_end("TILE END\0A")

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ArmSME] Add move vector to tile slice op and loweringsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 552261

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp

mlir/test/Dialect/ArmSME/arith-ops-to-sme.mlir

mlir/test/Dialect/ArmSME/invalid.mlir

mlir/test/Dialect/ArmSME/roundtrip.mlir

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/tile_fill.mlir

[mlir][ArmSME] Add move vector to tile slice op and lowerings
ClosedPublic