This adds the full set of vector memory access instructions. It
includes contiguous loads/stores, with an ordinary addressing mode
such as [r0,#offset] (plus writeback variants); gather loads and
scatter stores with a scalar base address register and a vector of
offsets from it (written [r0,q1] or similar); and gather/scatters with
a vector of base addresses (written [q0,#offset], again with
writeback). Additionally, some of the loads can widen each loaded
value into a larger vector lane, and the corresponding stores narrow
them again. Finally, there's the VLD2 / VLD4 family, which distributes
2 or 4 vectors' worth of memory data across the lanes of the same
number of registers but in a transposed order.
To implement these, we also have to add the addressing modes they
need, and the register list operands used by VLD2/VST4. Also, in
AsmParser, the isMem query function now has subqueries isGPRMem
and isMVEMem, according to which kind of base register is used by a
given memory access operand.