This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SME2] Add intrinsics to move multi-vectors to/from ZA.
ClosedPublic

Authored by kmclaughlin on Jan 18 2023, 8:46 AM.

Details

Summary

Adds intrinsics for the following:

  • mova: array to vector / vector to array
  • mova: tile to vector / vector to tile

Tablegen patterns have been added to match the ZA write intrinsics. As the
read intrinsics return a multi-vector, a function called SelectMultiVectorMove
has been added to AArch64ISelDAGToDAG to select the correct instruction. The
SelectSMETile function has also been added to check that the tile number
passed to read intrinsics is valid for the base register.

This patch also cleans up the sme_vector_to_tile_patterns multiclass to remove
the pattern for an offset of 0, which is handled by tileslice.

NOTE: These intrinsics are still in development and are subject to future changes.

Diff Detail

Event Timeline

kmclaughlin created this revision.Jan 18 2023, 8:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2023, 8:46 AM
kmclaughlin requested review of this revision.Jan 18 2023, 8:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2023, 8:46 AM
david-arm added inline comments.Jan 19 2023, 5:46 AM
llvm/include/llvm/IR/IntrinsicsAArch64.td
2928

I wonder - given these are moving from a tile to a vector is it perhaps better named as something like

SME2_Matrix_TileVector_Read_VG2_Intrinsic
SME2_Matrix_TileVector_Read_VG4_Intrinsic
SME2_Matrix_TileVector_Write_VG2_Intrinsic
SME2_Matrix_TileVector_Write_VG4_Intrinsic

and the others are actually reading from the array so perhaps these can be

SME2_ZA_ArrayVector_Read_VG2_Intrinsic
SME2_ZA_ArrayVector_Read_VG4_Intrinsic
SME2_ZA_ArrayVector_Write_VG2_Intrinsic
SME2_ZA_ArrayVector_Write_VG4_Intrinsic

what do you think?

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1838

I think this should probably be

case AArch64::ZAB0:
  if (TileNum == 0)
    break;
  return false;
case ...
1841

This indenting here doesn't look right I think?

1864

Maybe it's worth moving this code below the call to SelectSMETile, so it's close to where it's used?

1875

I wonder if it's better to simply return here and let it crash with a selection error? The problem with llvm_unreachable I think is that for a release build it will be a nop and will silently do the wrong thing.

4750

This is just a thought so feel free to ignore it if you think it makes things worse! But I wondered if you could avoiding passing the TileNum here, since you're already passing in the Node anyway. That way in SelectMultiVectorMove you could do:

unsigned TileNum = 0;
if (BaseReg != AArch64::ZA)
  TileNum = cast<ConstantSDNode>(Node->getOperand(2))->getZExtValue();
llvm/lib/Target/AArch64/SMEInstrFormats.td
814

This wasn't mentioned in the commit message, but it looks like you're simplifying the patterns here because you can always use tileslice to get you base + offset, even if offset = 0? It's a nice clean-up!

llvm/test/CodeGen/AArch64/sme2-intrinsics-extract-mova.ll
97

This add doesn't seem to 'add' any value, if you excuse the pun. :) And the same thing for the other 64-bit variants.

287

Again, more adds of 0 here.

llvm/test/CodeGen/AArch64/sme2-intrinsics-insert-mova.ll
111

Again, could you remove the adds of 0 from this test file too?

kmclaughlin marked 9 inline comments as done.
kmclaughlin edited the summary of this revision. (Show Details)
  • Renamed the intrinsic class names in IntrinsicsAArch64.td to SME2_Matrix_TileVector*/SME2_ZA_ArrayVector*
  • Changed SelectSMETile to remove extra braces & fix indentation
  • Replaced the llvm_unreachable in SelectMultiVectorMove with return false
  • Removed adds of zero from tests
llvm/include/llvm/IR/IntrinsicsAArch64.td
2928

I've changed these classes to use the names suggested above as I think they are more accurate.

llvm/lib/Target/AArch64/SMEInstrFormats.td
814

Yes, this can be removed because offsets of 0 are handled by tileslice. I've added a note to the commit message about this change too :)

Rebased patch.

Thanks for addressing my many review comments! It's almost perfect. :)

llvm/test/CodeGen/AArch64/sme2-intrinsics-insert-mova.ll
537

For the vector to array writes I think we can test non-zero slices here too similar to @za_read_vg1x4_f64?

i.e.

%slice.7 = add i32 %slice, 7
call void @llvm.aarch64.sme.write.vg1x4.nxv2i64(i32 %slice.7, <vscale x 2 x i64> %za1, <vscale x 2 x i64> %za2, <vscale x 2 x i64> %za3, <vscale x 2 x i64> %za4)
kmclaughlin marked an inline comment as done.
  • Added multi-vector to ZA intrinsic tests with non-zero slices
david-arm accepted this revision.Jan 24 2023, 7:28 AM

LGTM! Merci beaucoup. :)

This revision is now accepted and ready to land.Jan 24 2023, 7:28 AM
This revision was landed with ongoing or failed builds.Jan 25 2023, 3:34 AM
This revision was automatically updated to reflect the committed changes.