This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ArmSME] Calculate correct tile mask when lowering arm_sme.zero
ClosedPublic

Authored by benmxwl-arm on Aug 14 2023, 10:11 AM.

Details

Summary

This patch updates the lowering of the arm_sme.zero to intrinsics so
that it calculates the correct mask for the tile to zero.

The zero instruction takes an 8-bit mask which specifies which 64-bit
tiles to zero, ZA0.D to ZA7.D correspond to bits 0 to 7. To zero tiles
with element sizes of 8-bit to 32-bit just requires zeroing the right
64-bit tiles.

This is quite easy to calculate, each size has a "base mask" which can
be shifted left by the tile ID to get the mask for that tile.

base_mask << tile_id

After tile allocation, this will be folded to a constant mask.

Diff Detail

Event Timeline

benmxwl-arm created this revision.Aug 14 2023, 10:11 AM
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
benmxwl-arm requested review of this revision.Aug 14 2023, 10:11 AM

Fix some typos

Link to Arm docs on the SME zero instruction.

Makes sense, but how about zeroing multiple tiles at a time? See https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AArch64/sme-intrinsics-zero.ll for an example. We don't need to support all cases in one patch, but it would be good to know whether that's on the horizon :)

  • Add a little more of an explanation in the comments

Allocating multiple zero'd tiles would be a pretty simple extension, it
just requires OR-ing the masks together. It would require updating the
interface for the op though, so is best left for a follow-up patch :)

awarzynski accepted this revision.Aug 16 2023, 1:34 AM

LGTM, thank you Ben!

This revision is now accepted and ready to land.Aug 16 2023, 1:34 AM