This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ArmSME] Add move vector to tile slice op and lowerings
ClosedPublic

Authored by c-rhodes on Aug 3 2023, 7:33 AM.

Details

Summary

This adds a 'move_vector_to_tile_slice' op to the ArmSME dialect that
moves a 1-D scalable vector to a slice of a 2-D tile at a given index.

This is lowered to the 'llvm.aarch64.sme.write.horiz' intrinsic that
maps to the MOVA (vector to tile, single) SME instruction [1] when
lowering to LLVM. Like the SME load and store instructions this operates
on ZA tile slices, which are 1D vectors of horizontally or vertically
contiguous elements within a ZA tile.

This patch extends the lowering of 'arith.constant' to SME to support
non-zero constants using this new op. This requires materializing a
loop that broadcasts the constant to each tile slice with the
'vector_to_tile_slice' op. Unlike load and store, this is done during
conversion from Vector to ArmSME, rather than ArmSME to SCF. The latter
would require a higher-level custom op in the ArmSME dialect like
'tile_load' and 'tile_store' and this isn't necessary. We may also
remove the load and store ops in the future in favour of lowering
straight from Vector, at which point this would converge.

Currently only horizontal tile slices are supported. A future patch will
extend this mechanism to support 'vector.broadcast'.

Depends on D156980 D157004

[1] https://developer.arm.com/documentation/ddi0602

Diff Detail

Event Timeline

c-rhodes created this revision.Aug 3 2023, 7:33 AM
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
c-rhodes requested review of this revision.Aug 3 2023, 7:33 AM
dcaballe accepted this revision.Aug 8 2023, 2:47 PM

LGTM in general, thanks!

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
424

-> element type?

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
340–345

We are usually creating masks with vector.create_mask and vector.constant_mask but maybe it's already too late to introduce them here. I would expect them to have been lowered in an earlier stage. Just bringing this up in case you thought about it. (I'm not asking for changes)

This revision is now accepted and ready to land.Aug 8 2023, 2:47 PM

A few minor points/questions. LG otherwise

mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
417

Most Ops' names convey what the corresponding "action"/"function" is. Wondering whether this shouldn't be MoveVectorToTileSliceOp instead. Naming is hard.

mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
106

[nit] Just to make the split into 2 cases more visible.

112

[nit] Just to make the split into 2 cases more visible.

137

IIUC, the following block will only create the loop structure. The following operation is not created here:

loads each ZA tile slice from memory.

That's created further down. Also, rather than "loading from memory", this is creating "move from vector to a slice", right?

mlir/test/Dialect/ArmSME/roundtrip.mlir
586

Could you add some invalid cases in "invalid.mlir"?

mlir/test/Dialect/ArmSME/vector-ops-to-sme.mlir
251 ↗(On Diff #546856)

It would be good to check at least one more element type.

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/tile_fill.mlir
55

[nit] I would use some other value - 1 is super common and can be easily missed. Here it would be nice to emphasise that it could be _anything_.

benmxwl-arm added inline comments.Aug 21 2023, 10:10 AM
mlir/test/Integration/Dialect/Vector/CPU/ArmSME/tile_fill.mlir
75

It's landed :)

c-rhodes added inline comments.Aug 22 2023, 1:08 AM
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
417

Most Ops' names convey what the corresponding "action"/"function" is. Wondering whether this shouldn't be MoveVectorToTileSliceOp instead. Naming is hard.

are you suggesting to rename it both internally and externally (i.e. move_vector_to_tile_slice) or just the former?

424

-> element type?

For a 2-d scalable vector such as vector<[4]x[4]xi32> is the element type vector<[4]xi32>? That's what I want to express here but I do struggle with these descriptions. FWIW getElementType() returns the scalar i32. Perhaps The 1-D vector type must match the vector type of the inner dimension of the 2-D vector type?

mlir/test/Integration/Dialect/Vector/CPU/ArmSME/tile_fill.mlir
75

It's landed :)

Thanks for heads up I'll update this

c-rhodes updated this revision to Diff 552261.Aug 22 2023, 1:31 AM
  • Address most comments.
  • Add type constraint to VectorToTileSliceOp that verifies 1-D vector type matches element type of 2-D vector type (tile).
  • Rename VectorToTileSliceOp::getVectorType() -> VectorToTileSliceOp::getTileType() to prevent confusion with vector operand.
c-rhodes marked 5 inline comments as done.Aug 22 2023, 1:35 AM
c-rhodes added inline comments.
mlir/lib/Conversion/VectorToArmSME/VectorToArmSME.cpp
137

IIUC, the following block will only create the loop structure. The following operation is not created here:

loads each ZA tile slice from memory.

That's created further down. Also, rather than "loading from memory", this is creating "move from vector to a slice", right?

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
340–345

We are usually creating masks with vector.create_mask and vector.constant_mask but maybe it's already too late to introduce them here. I would expect them to have been lowered in an earlier stage. Just bringing this up in case you thought about it. (I'm not asking for changes)

I hadn't thought about that, thanks for mentioning. I can't see why it wouldn't work since a vector op (splat) is already added here, but I'm not really sure it simplifies this.

mlir/test/Dialect/ArmSME/roundtrip.mlir
586

Could you add some invalid cases in "invalid.mlir"?

Done, was also missing a type constraint that verifies 1-d vector type matches inner vector type of 2-d vector type.

awarzynski added inline comments.Aug 22 2023, 2:03 AM
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
417

I was thinking both.

424

I also though that you meant the element type here, as in e.g. i32 or f32.

I think that we need to be more explicit here:

The type of the 1-d scalable vector to be moved must match the type of the tile slice (note that 1 slice is effectively 1 row in a virtual tile).

WDYT?

c-rhodes updated this revision to Diff 552337.Aug 22 2023, 6:47 AM
c-rhodes marked 2 inline comments as done.

Update descriptions for op and type constraint.

c-rhodes added inline comments.Aug 22 2023, 6:50 AM
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
417

I was thinking both.

Still considering, it's more descriptive but also it's quite a long name

424

I also though that you meant the element type here, as in e.g. i32 or f32.

I think that we need to be more explicit here:

The type of the 1-d scalable vector to be moved must match the type of the tile slice (note that 1 slice is effectively 1 row in a virtual tile).

WDYT?

I've updated the description along these lines, hope it makes more sense now, appreciate the input!

c-rhodes updated this revision to Diff 553048.Aug 24 2023, 2:27 AM
c-rhodes retitled this revision from [mlir][ArmSME] Add vector to tile slice op and lowerings to [mlir][ArmSME] Add move vector to tile slice op and lowerings.
c-rhodes edited the summary of this revision. (Show Details)

Rename vector_to_tile_slice to move_vector_to_tile_slice

c-rhodes marked an inline comment as done.Aug 24 2023, 2:28 AM
dcaballe added inline comments.Aug 27 2023, 9:45 PM
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
424

I would call that inner 1-D sub-vector?

430

1-d -> 1-D for consistency?

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
340–345

Well, if this is expected to run before some of these "higher" level vector ops are lowered, I would make sure we generate those. We have patterns that are looking specifically for those and are able to understand if they are all-true/all-false masks, etc. We don't have mask patterns looking at constant ops.

awarzynski accepted this revision.Aug 29 2023, 1:36 AM

LGTM, thanks!

Following on from what Diego suggest re vector.mask, we probably should look into using those instead of vector.splat. At some point soon, not in this patch :)

c-rhodes added inline comments.Aug 29 2023, 1:47 AM
mlir/include/mlir/Dialect/ArmSME/IR/ArmSME.td
424

I would call that inner 1-D sub-vector?

Hm, it's not an inner 1-D sub-vector until it's moved, I think that would make sense for the inverse of this operation (extract) that does tile slice to vector.

430

1-d -> 1-D for consistency?

Ah good spot will fix this before landing, cheers

mlir/lib/Dialect/ArmSME/Transforms/LegalizeForLLVMExport.cpp
340–345

Well, if this is expected to run before some of these "higher" level vector ops are lowered, I would make sure we generate those. We have patterns that are looking specifically for those and are able to understand if they are all-true/all-false masks, etc. We don't have mask patterns looking at constant ops.

Sorry I'm not sure I follow, this is low-level and emitting the canonical form for an all active mask (a constant?), it's not clear to me what the benefit of such patterns are in this case?

I should mention there has been no consideration of masking so far in the ops introduced for our first target for SME of lowering linalg.fill, now that there is a basic path established (D158619) I've starting looking into masking.

mlir/test/Dialect/ArmSME/arith-ops-to-sme.mlir
107

side note, can you just return the value instead of using this fake op or is there something more fundamental that does not let us return a scalable vector here ?

c-rhodes added inline comments.Aug 29 2023, 9:09 AM
mlir/test/Dialect/ArmSME/arith-ops-to-sme.mlir
107

side note, can you just return the value instead of using this fake op or is there something more fundamental that does not let us return a scalable vector here ?

For the purposes of this test the scalable vector could be returned, but generally we can't support passing or returning 2-d scalable vectors to/from functions since these types can't be lowered to LLVM and even if they could it's not defined by the ABI. For this reason I opted for the fake use op so as to not set a precedent that this is something that can be done. I believe there are some earlier ArmSME tests where 2-d scalable vector are returned however, we should probably update them for consistency.