This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add all SME2.1 instructions Assembly/Disassembly
ClosedPublic

Authored by CarolineConcatto on Nov 7 2022, 9:52 AM.

Details

Summary

This patch adds a new feature flag:
sme-f16f16 to represent FEAT_SME-F16F16

This patch add the following instructions:
SME2.1 stand alone instructions:

MOVAZ (array to vector, four registers): Move and zero four ZA single-vector groups to vector registers.
      (array to vector, two registers): Move and zero two ZA single-vector groups to vector registers.
      (tile to vector, four registers): Move and zero four ZA tile slices to vector registers.
      (tile to vector, single): Move and zero ZA tile slice to vector register.
      (tile to vector, two registers): Move and zero two ZA tile slices to vector registers.

LUTI2 (Strided four registers): Lookup table read with 2-bit indexes.
      (Strided two registers): Lookup table read with 2-bit indexes.

LUTI4 (Strided four registers): Lookup table read with 4-bit indexes.
      (Strided two registers): Lookup table read with 4-bit indexes.

ZERO (double-vector): Zero ZA double-vector groups.
     (quad-vector): Zero ZA quad-vector groups.
     (single-vector): Zero ZA single-vector groups.

SME2p1 and SME-F16F16:
All instructions are half precision elements:

FADD: Floating-point add multi-vector to ZA array vector accumulators.

FSUB: Floating-point subtract multi-vector from ZA array vector accumulators.

FMLA (multiple and indexed vector): Multi-vector floating-point fused multiply-add by indexed element.
     (multiple and single vector): Multi-vector floating-point fused multiply-add by vector.
     (multiple vectors): Multi-vector floating-point fused multiply-add.

FMLS (multiple and indexed vector): Multi-vector floating-point fused multiply-subtract by indexed element.
     (multiple and single vector): Multi-vector floating-point fused multiply-subtract by vector.
     (multiple vectors): Multi-vector floating-point fused multiply-subtract.

FCVT (widening): Multi-vector floating-point convert from half-precision to single-precision (in-order).

FCVTL: Multi-vector floating-point convert from half-precision to deinterleaved single-precision.

FMOPA (non-widening): Floating-point outer product and accumulate.

FMOPS (non-widening): Floating-point outer product and subtract.

SME2p1 and B16B16:

BFADD: BFloat16 floating-point add multi-vector to ZA array vector accumulators.

BFSUB: BFloat16 floating-point subtract multi-vector from ZA array vector accumulators.

BFCLAMP: Multi-vector BFloat16 floating-point clamp to minimum/maximum number.

BFMLA (multiple and indexed vector): Multi-vector BFloat16 floating-point fused multiply-add by indexed element.
      (multiple and single vector): Multi-vector BFloat16 floating-point fused multiply-add by vector.
      (multiple vectors): Multi-vector BFloat16 floating-point fused multiply-add.

BFMLS (multiple and indexed vector): Multi-vector BFloat16 floating-point fused multiply-subtract by indexed element.
      (multiple and single vector): Multi-vector BFloat16 floating-point fused multiply-subtract by vector.
      (multiple vectors): Multi-vector BFloat16 floating-point fused multiply-subtract.

BFMAX (multiple and single vector): Multi-vector BFloat16 floating-point maximum by vector.
      (multiple vectors): Multi-vector BFloat16 floating-point maximum.

BFMAXNM (multiple and single vector): Multi-vector BFloat16 floating-point maximum number by vector.
        (multiple vectors): Multi-vector BFloat16 floating-point maximum number.

BFMIN (multiple and single vector): Multi-vector BFloat16 floating-point minimum by vector.
      (multiple vectors): Multi-vector BFloat16 floating-point minimum.

BFMINNM (multiple and single vector): Multi-vector BFloat16 floating-point minimum number by vector.
        (multiple vectors): Multi-vector BFloat16 floating-point minimum number.

BFMOPA (non-widening): BFloat16 floating-point outer product and accumulate.

BFMOPS (non-widening): BFloat16 floating-point outer product and subtract.

The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

Depends on: D137410

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2022, 9:52 AM
CarolineConcatto requested review of this revision.Nov 7 2022, 9:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2022, 9:52 AM
CarolineConcatto edited the summary of this revision. (Show Details)
  • Rebase

I've only glanced at the patch so far (will take a proper look tomorrow) but I've noticed what looks like a couple of rebase issues.

llvm/include/llvm/Support/AArch64TargetParser.def
133

This already exists further down.

134

Can you move this to just before sme-f64f64?

llvm/include/llvm/Support/AArch64TargetParser.h
78–80

It looks like you've inadvertently switched these during a rebase.

llvm/lib/Target/AArch64/AArch64.td
174–175

Please move this so it sits nearer the other SME feature flags.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
155–156

Please move this just before HasSMEF64F64.

CarolineConcatto marked 5 inline comments as done.
  • Address initial review comments
david-arm added inline comments.Nov 9 2022, 7:52 AM
llvm/lib/Target/AArch64/AArch64.td
483

I think this should be FEAT_SME_F16F16, instead of FEAT_SME-F16F16

llvm/lib/Target/AArch64/SMEInstrFormats.td
3428

I couldn't seem to find any case where op{3} is known to be 0b1 unless I've misunderstood something?

llvm/test/MC/AArch64/SME2/fmla-diagnostics.s
48

It looks like these were dodgy tests before. Is it possible to change the instruction to still get the same error as before?

llvm/test/MC/AArch64/SME2/fmls-diagnostics.s
33

Same comment as above.

CarolineConcatto marked 3 inline comments as done.
  • Address review comments
llvm/lib/Target/AArch64/SMEInstrFormats.td
3428

It should be 4, for movaz the output should have tile_ty:$_ZAn too.

llvm/include/llvm/Support/AArch64TargetParser.h
79

Rouge comma at end of list.

llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
793

This is missing a _H suffix, although to be honest you'll need a multiclass for code generation support anyway so you may as well follow the usual idiom and make sme2p1_luti4_vector_vg4_index a multiclass.

796

Is it correct to require SME2p1 in order to allow SMEF16F16? The documentation suggests FEAT_SME_F16F16 is read independently of other feature flags. Perhaps requiring SME2 is a better base requirement as this is the feature that added the core support for these instructions.

823

Same question as above regarding whether requiring SME2p1 is too stringent.

llvm/lib/Target/AArch64/SMEInstrFormats.td
1341–1342

Please can you pass in vg so we can have let Inst{16} = vg;

2048

16/32-bit? Or perhaps move the comment to the 32-bit specific multiclass.

2049

I think it'll be clearer if you pass in bit sz for bit 22 and have a 6-bit opc that represents {12-10:5-3}, that way the base instruction class has no holes.

Perhaps drop _32b?

2146

The same comments I had for sme2_multi_vec_array_vg2_index_32b apply here.

3592

indent me

4259

I think this class requires let Constraints = "$ZAd = $_ZAd";?

4270

Please can you make opc a 6-bit opcode so it matches the encoding group?

4320

Please can opc cover {18-15} to better reflect the encoding group?

4322

Please can sz remain as 2-bit to match the encoding group?

4369

Please can opc cover {18-16} to better reflect the encoding group?

4371

Please can sz remain as 2-bit to match the encoding group?

CarolineConcatto marked 10 inline comments as done.
  • Address review comments
CarolineConcatto marked 2 inline comments as done.Nov 14 2022, 1:51 AM
CarolineConcatto added inline comments.
llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
796

I had this conversation before posting the patches upstream. The developer's page is not clear, but it was agreed that these instructions are optional for sme2p1.

paulwalker-arm accepted this revision.Nov 14 2022, 4:51 AM

One final request but otherwise looks good.

llvm/lib/Target/AArch64/SMEInstrFormats.td
4402

This should be _H? Please can this multiclass take the same parameters as sme2p1_luti2_vector_vg4_index? I know it seems redundant but it'll make things easier if other variants are added later.

This revision is now accepted and ready to land.Nov 14 2022, 4:51 AM
This revision was landed with ongoing or failed builds.Nov 14 2022, 6:56 AM
This revision was automatically updated to reflect the committed changes.
CarolineConcatto marked an inline comment as done.