This patch adds initial support for the following instrinsics:
- llvm.aarch64.sve.ld2
- llvm.aarch64.sve.ld3
- llvm.aarch64.sve.ld4
For loading two, three and four vectors worth of data. Basic codegen is
implemented with reg+reg and reg+imm addressing modes being addressed
in a later patch.
The types returned by these intrinsics have a number of elements that is
a multiple of the elements in a 128-bit vector for a given type and N,
where N is the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for
32-bit elements the types are:
LD2 : <vscale x 8 x i32> LD3 : <vscale x 12 x i32> LD4 : <vscale x 16 x i32>
This is implemented with target-specific intrinsics for each variant
that take the same operands as the IR intrinsic but return N values,
where the type of each value is a full vector, i.e. <vscale x 4 x i32>
in the above example. These values are then concatenated using the
standard concat_vector intrinsic to maintain type legality with the IR.
These intrinsics are intended for use in the Arm C Language
Extension (ACLE).
Question: you have three overloaded operands here. How comes that you need to specify only one of them in the intrinsic name?
If I look at one of your tests:
By the definition you have specified here, I was expecting to see the following intrinsic: @llvm.aarch64.sve.ld2.nxv32i8.nxv16i8.p0nxv16i8. Am I missing something?