This patch adds the assembly/disassembly for the following instructions:
ADD (to vector): Add replicated single vector to multi-vector with multi-vector result. SQDMULH (multiple and single vector): Multi-vector signed saturating doubling multiply high by vector.
for 2 and 4 ZA SVE registers.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
It also adds more size for the multiple register tuple:
ZZ_b_mul_r, ZZ_h_mul_r, ZZZZ_b_mul_r, ZZZZ_h_mul_r,
for 8 bits and 16 bits with 2 and 4 ZA registers.
Depends on: D135468
Using ZPR2 here (and ZPR4 for ZZZZ_b_mul_r) isn't correct.
ZPR2 allows:
{z0.b, z1.b} {z1.b, z2.b} {z2.b, z3.b} {z3.b, z4.b} ...But {z1.b, z2.b} and {z3.b, z4.b} are not valid for ZZ_b_mul_r, because the first register must be a multiple of 2.
To fix this, you can create a new register class that only takes the "even" pairs (for ZPR2) or every fourth quad (for ZPR4) like this:
// SME2 multiple-of-2 or 4 multi-vector operands def ZPR2Mul2 : RegisterClass<"AArch64", [untyped], 128, (add (decimate ZSeqPairs, 2))> { let Size = 256; } def ZPR4Mul4 : RegisterClass<"AArch64", [untyped], 128, (add (decimate ZSeqQuads, 4))> { let Size = 512; }