This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Is it worth giving a name to VecOf<Unsigned<Scalar>>?