The indexed variant of vfmal.f16 and vfmsl.f16 q/d variations allows only d0 - d7 / s0 - s15 in the respective indexed operand.
Reference:
https://developer.arm.com/docs/ddi0597/b/simd-and-floating-point-instructions-alphabetic-order/vfmal-by-scalar-vector-floating-point-multiply-add-long-to-accumulator-by-scalar
Currently all registers are accepted, but the 2 / 1 upper bit[s] are / is ignored:
llvm-mc -triple arm -mattr=+fp16fml,+neon -show-encoding -o - - vfmsl.f16 q0, d1, d0[0] vfmsl.f16 q0, d1, d8[0] vfmsl.f16 q0, d1, d16[0] .text vfmsl.f16 q0, d1, d0[0] @ encoding: [0x50,0x08,0x11,0xfe] vfmsl.f16 q0, d1, d8[0] @ encoding: [0x50,0x08,0x11,0xfe] vfmsl.f16 q0, d1, d16[0] @ encoding: [0x50,0x08,0x11,0xfe] vfmsl.f16 d0, s1, s0[0] vfmsl.f16 d0, s1, s16[0] vfmsl.f16 d0, s1, s0[0] @ encoding: [0x90,0x08,0x10,0xfe] vfmsl.f16 d0, s1, s16[0] @ encoding: [0x90,0x08,0x10,0xfe]
This patch restrict the indexed operand to be between d0 - d7 and s0 - s15 ranges.