This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SME] Add zero instruction
ClosedPublic

Authored by c-rhodes on Jul 7 2021, 11:38 AM.

Details

Summary

This patch adds the zero instruction for zeroing a list of 64-bit
element ZA tiles. The instruction takes a list of up to eight tiles
ZA0.D-ZA7.D, which must be in order, e.g.

zero {za0.d,za1.d,za2.d,za3.d,za4.d,za5.d,za6.d,za7.d}
zero {za1.d,za3.d,za5.d,za7.d}

The assembler also accepts 32-bit, 16-bit and 8-bit element tiles which
are mapped to corresponding 64-bit element tiles in accordance with the
architecturally defined mapping between different element size tiles,
e.g.

  • Zeroing ZA0.B, or the entire array name ZA, is equivalent to zeroing all eight 64-bit element tiles ZA0.D to ZA7.D.
  • Zeroing ZA0.S is equivalent to zeroing ZA0.D and ZA4.D.

The preferred disassembly of this instruction uses the shortest list of
tile names that represent the encoded immediate mask, e.g.

  • An immediate which encodes 64-bit element tiles ZA0.D, ZA1.D, ZA4.D and ZA5.D is disassembled as {ZA0.S, ZA1.S}.
  • An immediate which encodes 64-bit element tiles ZA0.D, ZA2.D, ZA4.D and ZA6.D is disassembled as {ZA0.H}.
  • An all-ones immediate is disassembled as {ZA}.
  • An all-zeros immediate is disassembled as an empty list {}.

This patch adds the MatrixTileList asm operand and related parsing to support
this.

Depends on D105570.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Diff Detail

Event Timeline

c-rhodes created this revision.Jul 7 2021, 11:38 AM
c-rhodes requested review of this revision.Jul 7 2021, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2021, 11:38 AM
Matt added a subscriber: Matt.Jul 7 2021, 3:02 PM
c-rhodes updated this revision to Diff 358605.Jul 14 2021, 7:52 AM

Changes:

  • Updated sed line in tests to fix Windows precommit.
  • Fixed bug in parsing of tile list that permitted matrix operand with row or column indicator, e.g. zero {za0h.b}.
c-rhodes edited the summary of this revision. (Show Details)Jul 19 2021, 3:27 AM

Hi @c-rhodes, I'm only part-way through the patch, but here are some minor comments I have so far!

llvm/lib/Target/AArch64/AArch64RegisterInfo.td
1357

This seems to be unused - can we delete the argument?

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
2066

What if the pair doesn't make sense? For example, {16, AArch64::ZAD3}. In this case OutRegs will be returned empty.

2285

Given that you've now hard-coded properties about the maximum twice now (once above with RegMask <= 0xFF) and here with MaxBits = 8, is it worth having a constant declared somewhere that you can refer to? For example, MaxBits could be a constant in MatrixTileListOp and then

RegMask = (1 << MaxBits ) - 1;
2287

Do we need a separator here, like ' '?

llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp
708

Again, another hard-coded value here.

david-arm added inline comments.Jul 19 2021, 6:01 AM
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
3798

Is 'RegNum' guaranteed to always be > 0 for a valid register?

3803

To be honest it's quite hard to review this because I don't know what parseVectorKind returns. Can we specify the type instead of 'auto'?

llvm/lib/Target/AArch64/SMEInstrFormats.td
688

Are we missing ones for {za1.s,za3.s} and {z0.s,za2.s}?

Hi @c-rhodes, we had done some design work for the ZERO instruction, and it is interesting to see your implementation. I have some questions about the code, based on my understanding of the ISA.

llvm/lib/Target/AArch64/AArch64RegisterInfo.td
1375

Does this restrict a matrix tile list to contain only tiles of the same element type? The ISA documentation doesn't impose such a restriction, so I think it would be legal to write something like zero { za0.s, za5.d }. Did you consider supporting matrix tile lists of mixed element types?

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
3862

Is this specified in the ISA documentation? I thought matrix tile lists of mixed element types were allowed.

3868

Is this specified in the ISA documentation? It isn't obvious that the tiles in the list must be ordered.

llvm/lib/Target/AArch64/SMEInstrFormats.td
661

Wouldn't it be better for the instruction to accept an output RegisterOperand that covers all possible tile types, which would allow register allocation to assign the appropriate tile registers? Doing so would also allow us to distinguish between zero { za0.d, za4.d } and zero { za0.s } when we need to parse assembly code back into machine IR with correct register semantics.

688

Why are these implemented as InstAlias? Can they not be parsed by AArch64AsmParser::tryParseMatrixTileList?

Hi @c-rhodes, I'm only part-way through the patch, but here are some minor comments I have so far!

Thanks for the comments Dave, I've not responded to them all yet but I'll update the patch shortly to address them.

Hi @c-rhodes, we had done some design work for the ZERO instruction, and it is interesting to see your implementation. I have some questions about the code, based on my understanding of the ISA.

Hi @bryanpkc, thanks for the comments. Do you also have an implementation for this?

llvm/lib/Target/AArch64/AArch64RegisterInfo.td
1375

Does this restrict a matrix tile list to contain only tiles of the same element type? The ISA documentation doesn't impose such a restriction, so I think it would be legal to write something like zero { za0.s, za5.d }. Did you consider supporting matrix tile lists of mixed element types?

Yeah the element types must be the same. The ISA docs don't explicitly impose that restriction, but the actual instruction takes a list of 64-bit element types, the other types are really aliases.

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
3868

Is this specified in the ISA documentation? It isn't obvious that the tiles in the list must be ordered.

No, but I'm not sure there's a good reason for the tiles to not be in order?

llvm/lib/Target/AArch64/SMEInstrFormats.td
661

Wouldn't it be better for the instruction to accept an output RegisterOperand that covers all possible tile types, which would allow register allocation to assign the appropriate tile registers? Doing so would also allow us to distinguish between zero { za0.d, za4.d } and zero { za0.s } when we need to parse assembly code back into machine IR with correct register semantics.

I've not given that a great deal of thought yet to be honest, our focus is on MC layer support at the moment, may revisit this in the future.

688

Are we missing ones for {za1.s,za3.s} and {z0.s,za2.s}?

These aliases are for the preferred disassembly, the constraint is the instruction uses the shortest list of tile names that represent the encoded immediate mask. The parsed tile list gets converted (not necessary if input is .D tiles) to 64-bit tiles and then encoded as an 8-bit mask, a bit for each tile (za0.d-za7.d).

For {za1.s,za3.s}, the mapping is:

za1.s -> {za1.d, za5.d}
za3.s -> {za3.d, za7.d}
      -> {za1.d, za3.d, za5.d, za7.d} (mask: 10101010)

the shortest possible tile list for this mask is {za1.h} and that's defined above. Same principle applies for {z0.s,za2.s}.

For reference, the following table describes all possible aliases:

The following table describes all possible aliases:

          mask                                                 in                 preferred
0   0b11111111                                          zero {za}                 zero {za}
1   0b11111111                                       zero {za0.b}                 zero {za}
2   0b01010101                                       zero {za0.h}              zero {za0.h}
3   0b10101010                                       zero {za1.h}              zero {za1.h}
4   0b11111111                                 zero {za0.h,za1.h}                 zero {za}
5   0b00010001                                       zero {za0.s}              zero {za0.s}
6   0b00100010                                       zero {za1.s}              zero {za1.s}
7   0b01000100                                       zero {za2.s}              zero {za2.s}
8   0b10001000                                       zero {za3.s}              zero {za3.s}
9   0b00110011                                 zero {za0.s,za1.s}        zero {za0.s,za1.s}
10  0b01010101                                 zero {za0.s,za2.s}              zero {za0.h}
11  0b10011001                                 zero {za0.s,za3.s}        zero {za0.s,za3.s}
12  0b01100110                                 zero {za1.s,za2.s}        zero {za1.s,za2.s}
13  0b10101010                                 zero {za1.s,za3.s}              zero {za1.h}
14  0b11001100                                 zero {za2.s,za3.s}        zero {za2.s,za3.s}
15  0b01110111                           zero {za0.s,za1.s,za2.s}  zero {za0.s,za1.s,za2.s}
16  0b10111011                           zero {za0.s,za1.s,za3.s}  zero {za0.s,za1.s,za3.s}
17  0b11011101                           zero {za0.s,za2.s,za3.s}  zero {za0.s,za2.s,za3.s}
18  0b11101110                           zero {za1.s,za2.s,za3.s}  zero {za1.s,za2.s,za3.s}
19  0b11111111                     zero {za0.s,za1.s,za2.s,za3.s}                 zero {za}
20  0b11111111  zero {za0.d,za1.d,za2.d,za3.d,za4.d,za5.d,za6....                 zero {za}
21  0b01010101                     zero {za0.d,za2.d,za4.d,za6.d}              zero {za0.h}
22  0b10101010                     zero {za1.d,za3.d,za5.d,za7.d}              zero {za1.h}
23  0b00010001                                 zero {za0.d,za4.d}              zero {za0.s}
24  0b00100010                                 zero {za1.d,za5.d}              zero {za1.s}
25  0b01000100                                 zero {za2.d,za6.d}              zero {za2.s}
26  0b10001000                                 zero {za3.d,za7.d}              zero {za3.s}
27  0b00110011                     zero {za0.d,za1.d,za4.d,za5.d}        zero {za0.s,za1.s}
28  0b10011001                     zero {za0.d,za3.d,za4.d,za7.d}        zero {za0.s,za3.s}
29  0b01100110                     zero {za1.d,za2.d,za5.d,za6.d}        zero {za1.s,za2.s}
30  0b11001100                     zero {za2.d,za3.d,za6.d,za7.d}        zero {za2.s,za3.s}
31  0b01110111         zero {za0.d,za1.d,za2.d,za4.d,za5.d,za6.d}  zero {za0.s,za1.s,za2.s}
32  0b10111011         zero {za0.d,za1.d,za3.d,za4.d,za5.d,za7.d}  zero {za0.s,za1.s,za3.s}
33  0b11011101         zero {za0.d,za2.d,za3.d,za4.d,za6.d,za7.d}  zero {za0.s,za2.s,za3.s}
34  0b11101110         zero {za1.d,za2.d,za3.d,za5.d,za6.d,za7.d}  zero {za1.s,za2.s,za3.s}
688

Why are these implemented as InstAlias? Can they not be parsed by AArch64AsmParser::tryParseMatrixTileList?

See my comment below regarding aliases. To expand on the parsing a bit, tryParseMatrixTileList will parse tile lists with 8/16/32 or 64-bit element types, the non 64-bit types are treated as aliases and get converted to .D tiles, then encoded as an 8-bit mask.

Hi @bryanpkc, thanks for the comments. Do you also have an implementation for this?

Yes, we implemented the SME instructions internally. Most of our code is very similar to the patches you have upstreamed, but ZERO was tricky and our approach differs quite a bit.

llvm/lib/Target/AArch64/AArch64RegisterInfo.td
1375

I understand that the bits in the immediate operand refer to .d tiles, but it seems useful to allow a programmer to keep referring to the register operands as .h and .d (for example), if that's how the registers are used after they are zeroed.

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
3868

IMO the assembler should be more forgiving here, e.g. when the user programmatically generates the assembly code and forgets to sort the registers. We may also want to be consistent with the ARM backend, which produces a warning, not an error, in a similar situation:

        .text
foo.s:1:14: warning: register list not in ascending order
stm sp!, {r5,r3,r0}
             ^
foo.s:1:17: warning: register list not in ascending order
stm sp!, {r5,r3,r0}
                ^
        stm     sp!, {r0, r3, r5}
llvm/lib/Target/AArch64/SMEInstrFormats.td
661

We also went with an immediate operand at first, but eventually replaced it with a register operand, mainly to allow register allocation to work.

c-rhodes added inline comments.Jul 20 2021, 8:49 AM
llvm/lib/Target/AArch64/SMEInstrFormats.td
661

We also went with an immediate operand at first, but eventually replaced it with a register operand, mainly to allow register allocation to work.

Your approach sounds more complete, I'd be interested in taking a look, are you able to upstream it?

sdesmalen added inline comments.Jul 20 2021, 10:59 PM
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
3868

I agree that if the spec doesn't require the operands to be in a specific order, that the instruction should accept the operands in any order.

llvm/lib/Target/AArch64/SMEInstrFormats.td
661

@bryanpkc you make a good point and it would be interested to see those patches!

For this patch I think that unless the changes you're suggesting are trivial, it would make sense to have any changes that are not required for the assembler as follow-up patches. I'm a bit cautious about this otherwise holding up SME asm support into LLVM 13, since those changes aren't necessarily required for the assembler.

c-rhodes updated this revision to Diff 360471.Jul 21 2021, 8:34 AM
c-rhodes set the repository for this revision to rG LLVM Github Monorepo.
  • Address comments.
  • Replaced MatrixTileList<EltSize> operands with a single operand. Since legal tiles (8/16/32) get mapped to 64-bit tiles then register mask, and the shortest possible tile lists are defined via aliases, individual operands for each elt size isn't necessary and adds complication.
c-rhodes marked 2 inline comments as done.Jul 21 2021, 8:43 AM
c-rhodes added inline comments.
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
2066

What if the pair doesn't make sense? For example, {16, AArch64::ZAD3}. In this case OutRegs will be returned empty.

I've change the match register name function (matchMatrixTileListRegName) to only match tiles that are valid for tile lists, now we shouldn't end up here I've added an assert that the pair isn't empty.

2287

Do we need a separator here, like ' '?

I don't think so, this just emits the bits, e.g.

zero {za0.s, za2.s} -> <matrixlist 01010101

although I realise now the closing > is missing, I'll fix that.

3798

Is 'RegNum' guaranteed to always be > 0 for a valid register?

Yeah 0 is NoRegister, defined in llvm/include/llvm/MC/MCRegister.h:
static constexpr unsigned NoRegister = 0u;

I suppose the match register functions could default on NoRegister to be more explicit, but not sure how useful that is.

3868

IMO the assembler should be more forgiving here, e.g. when the user programmatically generates the assembly code and forgets to sort the registers. We may also want to be consistent with the ARM backend, which produces a warning, not an error, in a similar situation:

        .text
foo.s:1:14: warning: register list not in ascending order
stm sp!, {r5,r3,r0}
             ^
foo.s:1:17: warning: register list not in ascending order
stm sp!, {r5,r3,r0}
                ^
        stm     sp!, {r0, r3, r5}

Fair point, I've changed it to a warning.

llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp
708

Again, another hard-coded value here.

not really sure where I could put a constant here

@bryanpkc you make a good point and it would be interested to see those patches!

For this patch I think that unless the changes you're suggesting are trivial, it would make sense to have any changes that are not required for the assembler as follow-up patches. I'm a bit cautious about this otherwise holding up SME asm support into LLVM 13, since those changes aren't necessarily required for the assembler.

@c-rhodes @sdesmalen I'm working on getting approvals to open our code. I don't know exactly how long it will take, but hopefully it won't be too long. I agree that we can consider that issue in a follow-up patch.

@bryanpkc you make a good point and it would be interested to see those patches!

For this patch I think that unless the changes you're suggesting are trivial, it would make sense to have any changes that are not required for the assembler as follow-up patches. I'm a bit cautious about this otherwise holding up SME asm support into LLVM 13, since those changes aren't necessarily required for the assembler.

@c-rhodes @sdesmalen I'm working on getting approvals to open our code. I don't know exactly how long it will take, but hopefully it won't be too long. I agree that we can consider that issue in a follow-up patch.

Great, thanks @bryanpkc

david-arm accepted this revision.Jul 26 2021, 7:05 AM

LGTM! Thanks for dealing with all the comments @c-rhodes!

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
2287

nit: I think it's still missing a closing '>'

This revision is now accepted and ready to land.Jul 26 2021, 7:05 AM
This revision was landed with ongoing or failed builds.Jul 27 2021, 1:36 AM
This revision was automatically updated to reflect the committed changes.