This is an archive of the discontinued LLVM Phabricator instance.

[AArch64]SME2 Multi vector Sel Load and Store instructions
ClosedPublic

Authored by CarolineConcatto on Oct 18 2022, 7:09 AM.

Details

Summary

This patch adds the assembly/disassembly for the following instruction:

SEL: Multi-vector conditionally select elements from two vectors
     for 2 and 4 registers

Non-constiguous load with stride resgisters:

LD1B (scalar + immediate): Contiguous load of bytes to multiple strided vectors (immediate index).
     (scalar + scalar): Contiguous load of bytes to multiple strided vectors (scalar index).
LD1D (scalar + immediate): Contiguous load of doublewords to multiple strided vectors (immediate index).
     (scalar + scalar): Contiguous load of doublewords to multiple strided vectors (scalar index).
LD1H (scalar + immediate): Contiguous load of halfwords to multiple strided vectors (immediate index).
     (scalar + scalar): Contiguous load of halfwords to multiple strided vectors (scalar index).
LD1W (scalar + immediate): Contiguous load of words to multiple strided vectors (immediate index).
     (scalar + scalar): Contiguous load of words to multiple strided vectors (scalar index).

LDNT1B (scalar + immediate): Contiguous load non-temporal of bytes to multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous load non-temporal of bytes to multiple strided vectors (scalar index).
LDNT1D (scalar + immediate): Contiguous load non-temporal of doublewords to multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous load non-temporal of doublewords to multiple strided vectors (scalar index).
LDNT1H (scalar + immediate): Contiguous load non-temporal of halfwords to multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous load non-temporal of halfwords to multiple strided vectors (scalar index).
LDNT1W (scalar + immediate): Contiguous load non-temporal of words to multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous load non-temporal of words to multiple strided vectors (scalar index).

Non-constiguous store with stride resgisters:

ST1B (scalar + immediate): Contiguous store of bytes from multiple strided vectors (immediate index).
     (scalar + scalar): Contiguous store of bytes from multiple strided vectors (scalar index).
ST1D (scalar + immediate): Contiguous store of doublewords from multiple strided vectors (immediate index).
     (scalar + scalar): Contiguous store of doublewords from multiple strided vectors (scalar index).
ST1H (scalar + immediate): Contiguous store of halfwords from multiple strided vectors (immediate index).
     (scalar + scalar): Contiguous store of halfwords from multiple strided vectors (scalar index).
ST1W (scalar + immediate): Contiguous store of words from multiple strided vectors (immediate index).
     (scalar + scalar): Contiguous store of words from multiple strided vectors (scalar index).

STNT1B (scalar + immediate): Contiguous store non-temporal of bytes from multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous store non-temporal of bytes from multiple strided vectors (scalar index).
STNT1D (scalar + immediate): Contiguous store non-temporal of doublewords from multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous store non-temporal of doublewords from multiple strided vectors (scalar index).
STNT1H (scalar + immediate): Contiguous store non-temporal of halfwords from multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous store non-temporal of halfwords from multiple strided vectors (scalar index).
STNT1W (scalar + immediate): Contiguous store non-temporal of words from multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous store non-temporal of words from multiple strided vectors (scalar index).

  The reference can be found here:

      https://developer.arm.com/documentation/ddi0602/2022-09

This patch also adds a new SVE vector list to represent the stride loads/stores
ZPRVectorListStrided and the sets of 2 and 4 ZA registers:
ZZ_[b|h|w|d]_strided and ZZZZ_[b|h|w|d]_strided

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2022, 7:09 AM
CarolineConcatto requested review of this revision.Oct 18 2022, 7:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2022, 7:09 AM
Matt added a subscriber: Matt.Oct 21 2022, 12:39 PM
llvm/test/MC/AArch64/SVE/ld2b-diagnostics.s
84

Sorry I've not taken a proper look yet but this change looks like a step backward compared to the original informative error. Is this fixable?

llvm/test/MC/AArch64/SVE/ld2b-diagnostics.s
84

So the problem here is that now we are able to parse the instructions. I added
ZZ_[b|h|s|d]_strided and ZZZZ_[b|h|s|d]_strided. So all the operands are valid
So this set of registers is valid, but the instruction is not compatible with any other available instructions. That is the reason we print invalid operand for instruction.

I guess this patch needs updating? because the predicate as counter work has landed separately and I see support for new loads and stores that's not mentioned in the summary.

CarolineConcatto retitled this revision from [AArch64]SME2 Multi vector Sel instructions to [AArch64]SME2 Multi vector Sel Load and Store instructions.
CarolineConcatto edited the summary of this revision. (Show Details)

update commit message

I've not reviewed the td files yet but...

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
1419–1430

Is it not possible to reuse isTypedVectorList much like you did for isTypedVectorListMultiple?

1800

Do you need to be stricter here because I believe only z0-z7 is allowed here and the else block only allows z16-z23? Either that or you need some extra asserts before each of the calls to Inst.addOperand.

My guess is the asserts are enough because we should have already failed before getting here if the registers are invalid.

llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
1540

Can this be put after NumRegs > 1 && so we don't call getNextVectorRegister unnecessarily?

1555–1556

Is this necessary? Is it possible for a register to be in AArch64::ZPR2RegClassID but not in AArch64::ZPRRegClassID? I'm just wondering why we've not hit this before.

llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp
196–198

This looks like a bogus rebase issue due to the function being renamed to EncodePPR_p8to15.

paulwalker-arm added inline comments.Nov 8 2022, 7:54 AM
llvm/lib/Target/AArch64/SMEInstrFormats.td
3635–3636

Please can you pass in op0:op1:op2 into this class then use {0, 1, 0, 0, 0, 0} and {?, 0, ?, 0, ?, 0} accordingly?

3724–3735

I'm going to surprise/annoy you here but there's more going on here than I typically like. Do you mind not having the vg24 base classes for the LD1/ST1 instructions in this patch and just have the vg2/vg4 classes fully define Inst?

CarolineConcatto marked 7 inline comments as done.
  • Address review comments
llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
1800

I believe you want to check if the first register is in range, correct? If so, then I believe I have addressed your comment.

llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp
196–198

I remove the function, but forgot the header.

llvm/lib/Target/AArch64/SMEInstrFormats.td
3724–3735

I believe you only want for the scalar-scalar instruction, is that correct?
The scalar+immediate I need the multiclass so I believe it is ok to have some bit defined there.
I have change to use your {?,?} scheme.

paulwalker-arm added inline comments.Nov 9 2022, 10:12 AM
llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
655

indent me

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
1800

Does getVectorListStart() < AArch64::Z16 not work? Seems like it would make this function more readable.

llvm/lib/Target/AArch64/SMEInstrFormats.td
3724–3735

Yes that's fine, thanks for making the change @CarolineConcatto.

3735–3738

indent me

3740–3741

What does this InstAlias cover? too me it looks identical to the format in sme2_ld_vector_vg24_multi_scalar_immediate. The same question likely applies to the other multiclasses.

3754–3757

indent me

3812–3813

This is missing let Inst{2} = 0b0;.

3848–3851

indent me

3866–3869

indent me

CarolineConcatto marked 6 inline comments as done.
  • address review comments
llvm/lib/Target/AArch64/SMEInstrFormats.td
3812–3813

Thank you Paul! I would have thought the build would have shown that, specially the encoding test.
I think that if we don't set the value it becomes zero.

paulwalker-arm accepted this revision.EditedNov 10 2022, 7:31 AM

For all my "indent me" comments you've missed the closing }, but otherwise looks good.

This revision is now accepted and ready to land.Nov 10 2022, 7:31 AM
This revision was landed with ongoing or failed builds.Nov 10 2022, 8:06 AM
This revision was automatically updated to reflect the committed changes.