# [IR] Add new intrinsics interleave and deinterleave vectorsClosedPublic

Authored by CarolineConcatto on Jan 17 2023, 5:41 AM.

# Details

Reviewers
 craig.topper fhahn paulwalker-arm efriedma reames sdesmalen mgabka
Commits
rGd515ecca6834: [IR] Add new intrinsics interleave and deinterleave vectors
Summary

This patch adds 2 new intrinsics:

```; Interleave two vectors into a wider vector
<vscale x 4 x i64> @llvm.vector.interleave2.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd)

; Deinterleave the odd and even lanes from a wider vector
{<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv2i64(<vscale x 4 x i64> %vec)```

The main motivator for adding these intrinsics is to support vectorization of
complex types using scalable vectors.

The intrinsics are kept simple by only supporting a stride of 2, which makes
them easy to lower and type-legalize. A stride of 2 is sufficient to handle
complex types which only have a real/imaginary component.

The format of the intrinsics matches how shufflevector is used in
LoopVectorize. For example:

```using cf = std::complex<float>;

void foo(cf * dst, int N) {
for (int i=0; i<N; ++i)
dst[i] += cf(1.f, 2.f);
}```

For this loop, LoopVectorize:

```(1) Loads a wide vector (e.g. <8 x float>)
(2) Extracts odd lanes using shufflevector (leading to <4 x float>)
(3) Extracts even lanes using shufflevector (leading to <4 x float>)
(4) Performs the addition
(5) Interleaves the two <4 x float> vectors into a single <8 x float> using
shufflevector
(6) Stores the wide vector.```

In this example, we can 1-1 replace shufflevector in (2) and (3) with the
deinterleave intrinsic, and replace the shufflevector in (5) with the
interleave intrinsic.

The SelectionDAG nodes might be extended to support higher strides (3, 4, etc)
as well in the future.

Similar to what was done for vector.splice and vector.reverse, the intrinsic
is lowered to a shufflevector when the type is fixed width, so to benefit from
existing code that was written to recognize/optimize shufflevector patterns.

Note that this approach does not prevent us from adding new intrinsics for other
strides, or adding a more generic shuffle intrinsic in the future. It just solves
the immediate problem of being able to vectorize loops with complex math.

# Diff Detail

### Event Timeline

Herald added a project: Restricted Project. Jan 17 2023, 5:41 AM
Herald added subscribers: jdoerfert, hiraditya.
CarolineConcatto requested review of this revision.Jan 17 2023, 5:41 AM
Herald added a project: Restricted Project. Jan 17 2023, 5:41 AM

Just to add a bit of context here: The main motivator for adding these intrinsics is so that we can vectorise loops that have complex math using scalable vectors in a way that is simple to hook into the LoopVectorizer.

I'm aware there have been discussions about more generic representations of shuffles. These intrinsics are orthogonal to those discussions. They aim to solve a particular and current problem (vectorizing complex math) and follow the design principle that was taken for vector.insert/extract/reverse/splice, i.e. to have specific intrinsics for different shuffle patterns. We've done a fair bit of experimentation with different kinds of de/interleaving intrinsics. Some of the benefits we found with this proposed form is that:

• They can be trivially combined with loads/stores to get struct load/store instructions (e.g. LD2/ST2).
• They're simple to lower for both SVE and RVV, which have instructions to efficiently do pair-wise interleaving/deinterleaving.
• They're reasonably simple to type-legalize.
• They should be easy to hook into the LoopVectorizer.

I plan on doing a real review in a moment, but let me start with a high level comment.

I am one of the folks who thinks long term that we need a generalized means for describing shuffles with runtime masks. I had started a few months ago to put together a proposal for the same, but it stalled due to lack of attention. Despite this long term direction, I want to explicitly say that I do *not* oppose the addition of intrinsics in the near term to solve the same problem. We need to unblock progress here, and we can migrate at a later time to a more general solution if one comes along.

First, there was another take on this done in https://reviews.llvm.org/D134438. That approach tried to introduce interleaving stores and deinterleaving loads, whereas this one separates interleave into it's own set of intrinsics. I think I like this approach better if there if can be made to work cleanly.

Second, I think we can generalize your intrinsics to handle general interleave and deinterleave without much effort.

For the interleave case, we can simply allow an arbitrary number of vector arguments with matching vector types. If the input type is <vscale x N x ty> than the result is <vscale x A*N x ty> where A is the (compile time constant) number of arguments. We could even land the current definition and do this generalization in a follow on if desired.

For the deinterleave case, it's a bit trickier. I'd like to avoid specific odd/even versions. One option I see is to add two integer constant arguments to the intrinsic. The first would be the stride, the second would be the remainder. So, your "even" variant becomes deinterleave(vec, 2, 0). One piece that I'm not sure works here is that our result type needs to be a function of the type of the vector argument and the first integer argument. That may require some custom verification rules.

Given these are in the experimental namespace, I'd could probably be convinced that we should land these and then iterate.

llvm/include/llvm/CodeGen/ISDOpcodes.h
574

The choice here to represent the longer vector type as two vectors which are implicitly concatenated is interesting. Can you explain why you made that choice? Is it important for legalization?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11573

Is this actually the semantic of a extract_vector on scalable vectors? Given e.g. 2, I'd expect to get a vector starting at the 3rd element, not split at the high half.

11580

This should be createStrideMask in VectorUtils.h right?

11602

Comment says deinterleave, fix.

11605

createInterleaveMask - though, I think you chose a different strategy using the concat_vector here. Make sure you keep it consistent if you change one part.

reames added inline comments.Jan 17 2023, 8:54 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11605

ignore me on the second part of the comment here; I'd misread.

Reusing createInterleaveMask would still be good.

CarolineConcatto marked 4 inline comments as done.Jan 18 2023, 3:15 AM

Use createInterleaveMask and createStrideMask for fixed vector.

llvm/include/llvm/CodeGen/ISDOpcodes.h
574

It is more complicated when we have different sizes of input and output to legalise, keeping all inputs and outputs in the same size makes legalisation simpler.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11573

I don't know if I understand what you are writing about. But just in case this is my explanation:

The ISD Node for DEINTERLEAVE, needs the input vector divided by two. Splitting the input vector in half makes all operands have the same size(inputs and outputs). This makes the legalisation work less complicated.
For the deinterleave the output is always half the size of the input

```def int_experimental_vector_deinterleave_even : DefaultAttrsIntrinsic<[LLVMHalfElementsVectorType<0>],
[llvm_anyvector_ty],```

So if we split the input in half they should all have the same size when using ISD Node.

As far as I understand idx is a multiple of the known minimum vector length.
For instance:
<nxv2i64> extract_vector<nxv4i64>, the index can be only 0 or 2
<nxv4i32> extract_vector<nxv8i32>, the index can only be 0 or 4

So I believe the index is correct if we want to split the input vector in half.

/// EXTRACT_SUBVECTOR(VECTOR, IDX) - Returns a subvector from VECTOR.

```/// Let the result type be T, then IDX represents the starting element number
/// from which a subvector of type T is extracted. IDX must be a constant
/// multiple of T's known minimum vector length.```

Thanks for your feedback @reames

We could even land the current definition and do this generalization in a follow on if desired.

Great! That was kind of our reasoning with doing just a stride of 2, keep it simple at first and if there is need for it we can extend it to other strides as well.

For the interleave case, we can simply allow an arbitrary number of vector arguments with matching vector types. If the input type is <vscale x N x ty> than the result is <vscale x A*N x ty> where A is the (compile time constant) number of arguments.

For the deinterleave case, it's a bit trickier. I'd like to avoid specific odd/even versions. One option I see is to add two integer constant arguments to the intrinsic.

That is one of the things we experimented with and a reason to design the ISD nodes in this way, as they're easily extended for higher strides.
Legalisation for non-power-of-2 strides (like 3 or 5) gets a bit awkward though as it requires lots of insert/extract_subvector operations, some of them SVE does not yet support (we need to put in a bit more work to support nxv1* types), but we could have a cost-model to avoid choosing such strides in the LV.

One thing to keep in mind is that all targets must be able to lower these intrinsics even if they don't have dedicated instructions for such interleaves (e.g. when they can't be merged with load/store instructions). I think we can probably fall back to using gather/scatter to implement lowering of these operations for any stride.

The first would be the stride, the second would be the remainder. So, your "even" variant becomes deinterleave(vec, 2, 0). One piece that I'm not sure works here is that our result type needs to be a function of the type of the vector argument and the first integer argument. That may require some custom verification rules.

We experimented with just passing the 'offset'. The stride could be deduced from the types (e.g. if output is <vscale x 2 x i64> and input is <vscale x 8 x i64>, then the stride is 4), and the offset would tell at what element to start deinterleaving (it would be a value 0 <= offset < stride).

llvm/include/llvm/CodeGen/ISDOpcodes.h
574

That's right, legalisation becomes simpler when all the types are the same. For example, when the input vector is illegal (too wide) but the vector output is legal, we'd need to split the operation into two.

```// Assuming that nxv4i32 is legal, and nxv8i32 needs splitting
nxv4i32 deinterleave(nxv8i32, 0)
->
// Now the input vector is legal, but the output type of deinterleave (nxv2i32)
// is illegal and the operation needs further promotion or widening
nxv4i32 concat(nxv2i32 deinterleave(nxv4i32 extract_lo(nxv8i32), 0),
nxv2i32 deinterleave(nxv4i32 extract_hi(nxv8i32), 0))```

Whereas, if we split the vector such that we have nxv4i32 deinterleave(nxv4i32, nxv4i32, 0), then all the types are legal (or illegal, when using a different example) at the same time.

Matt added a subscriber: Matt.Jan 18 2023, 2:07 PM

Hi @CarolineConcatto & @sdesmalen, I've a couple of simplifications for you to consider which I believe makes things easier to work with. Perhaps simplifications is the wrong word but I do think they'll introduce more uniformity to the design.

Intrinsics:
What about implementing total shuffles and breaking the dependence on vector types having any meaning, with this encoded as a discrete immediate operand instead. For example:

```Ty A = @llvm.experimental.vector.interleave.Ty(Vec, shape)
Ty B = @llvm.experimental.vector.deinterleave.Ty(Vec, shape)```

Here the intrinsics are simply vector in vector out with all input lanes existing in the output just at a different location (this is what I mean by a total shuffle). If only part of the result is important to the caller then they'll just extract the part they need. Here shape essentially refers to the number of subvectors that are logically contained within Vec(interleave) or B(deinterleave) and for this initial implementation we'd restrict support to just the value 2. The main usage rule is Ty.getKnownMinElementCount() must be devisable by shape.

Do you see any issues here? My thinking is that it becomes trivial to see how we'd support other strides (i.e. we'd just extend the verifier to allow shape=new-stride).

CodeGen:
As you know the vector types here are critical because we must be able to legalise all supported variants. I prefer how you've defined ISD::VECTOR_INTERLEAVE over ISD::VECTOR_DEINTERLEAVE because it represents a total shuffle.

However, my proposal is to match the above intrinsic interface but replace the single vector in/out rule with one that dictates the ISD nodes must have a matching number of vector inputs and outputs with all having the same type. The shape operand remains as is and the operations are defined to first concatenate all N vector operands, perform the necessary shuffle (based on the shape), before the result is then evenly split into N vectors.

The important thing here is that shape does not dictate anything about the number of vectors. Once all type legalisation is in place I'd expect a simple mapping from intrinsic to ISD node. After type legalisation the vector counts might change but shape does not. I think the shape gives enough information to guide type/operation legalisation in the best order to split, promote or widen the vectors?

I don't think this deviates much from your current design but does provide more extensibility. Given this patch is not worrying about type legalisation the biggest change is likely to be to the operation descriptions, but I'd appreciate a little tire kicking to see if I'm misrepresenting the benefits. What do you think?

Paul,
Let me know if I understood correctly.
You are suggesting that:

1. Intrinsic have only 1 vec input and 1 vec output.
```<vscale x 4 x i64> @llvm.experimental.vector.deinterleave.nxv4i64(<vscale x 4 x i64> %vec, i64 2)
<vscale x 4 x i64> @llvm.experimental.vector.interleave.nxv4i64(<vscale x 4 x i64> %vec, i64 2)```
1. ISD node have shapes in and out vectors. For instance, if shape is 2:
```RES0, RES1 = DEINTERLEAVE (VEC0, VEC1)
RES0, RES1 = INTERLEAVE (VEC0, VEC1)```

and all the vector sizes(input and output) are equal

If I understood it correctly, these are my 2 cents:
A)This solution is more configurable than the one we are proposing. So the intrinsic verifier needs to have rules to avoid mistakes like:

`A.1) Shape not proportional to the minimum number of elements(As you wrote:`

Ty.getKnownMinElementCount() must be devisable by shape

.)

For instance:
It should not be possible to do:

`<vscale x 7 x i64> @llvm.experimental.vector.deinterleave.nxv7i64(<vscale x 7 x i64> %vec, i64 2)`

At the moment the intrinsic is checked by tablegen(TargetSelectionDAG), but as you proposed we need to add code for it in Verifier.cpp. I believe I cannot use tablegen to make sure that shape is proportional to the minimum number of elements

`A.2) Shapes that are not supported. Like:`
`<vscale x 12 x i64> @llvm.experimental.vector.interleave.nxv12i64(<vscale x 12 x i64> %vec, i64 4)`

or

`<vscale x 6 x i64> @llvm.experimental.vector.interleave.nxv7i64(<vscale x 6 x i64> %vec, i64 3)`

B) For DEINTERLEAVE we can mix even and odd vectors when extracting without being very clear.
(If I could give my suggestion too, we should not return deinterleave as a concatenated vector, but as a struct of N vectors(2 in this case))
The reason is that:
Imagine we want to deinterleave a vector like <v6i64><i64 0, i641, i64 2, i64 3, i64 4, i64 5>, the result stored in <v6i64>Res and with a shape equal to 2. It should also split the vector into even and odds elements.
So the llvm-IR would be something like:

```Res = <v6i64>@llvm.experimental.vector.deinterleave.v6i64(<v6i64><i64 0, i641, i64 2, i64 3, i64 4, i64 5>, i64 2)
Even= <v2i64> @llvm.extract.vector.v2i64(<v6i64> %Res, i64 0)
Odd = <v4i64> @llvm.extract.vector.v4i64(<v6i64> %Res, i64 2)```

The return vector is:
Res =<v6i64><i64 0, i64 2, i64 4, i64 1, i64 3, i64 5>
and Even and Odd are:
Even= <v2i64><i64 0, i64 2>
Odd =<v4i64><i64 4,i64 1, i64 3, i64 5>
We still have 2 vectors, but wrongly split.

The same would not happen if the deinterleave returns a struct with even and odd vectors. We would need to do something like this to have even and odd:

```Res = {<v3i64>, <v3i64>} @llvm.experimental.vector.deinterleave.v3i64(<v6i64><i64 0, i641, i64 2, i64 3, i64 4, i64 5>, i64 2)
Even= <3i64> @llvm.extract.element.v3i64({<v3i64>, <3i64> }%Res, i64 0)
Odd = <3i64> @llvm.extract.element.v3i64({<v3i64>, <v3i64>} %Res, i64 1)```

Res = {<v3i64><i64 0, i64 2, i64 4>,<v3i64><i64 1, i64 3, i645>}
and Even and Odd are:
Even= <v3i64><i64 0, i64 2, i64 4>
Odd =<v3i64><i64 1, i64 3, i64 5>

I believe that the following is not possible, without having to concatenate after the intrinsic call. Which, IMHO, makes it clear the intention/goal.

```Even = <v2i64> @llvm.extract.vector.v2i64(<v3i64> %Res, i64 0)
Odd = <v4i64> @llvm.extract.vector.v4i64(<v3i64> %Res, i64 2)```

And if we want to be similar to a fixed vector. The mask for deinterleave only returns 1 vector, which could be odd or even and not all concatenated.

If all vectors in the ISDNode are the same size I don't foresee any problem with the legalization. But if they are different, then it becomes complicated, as explained before.

To put it in a nutshell, I would say:
I am fine updating as you suggested, I do not foresee many problems. But I would like us to agree on one suggestion/solution before doing more significant changes.

What about implementing total shuffles and breaking the dependence on vector types having any meaning, with this encoded as a discrete immediate operand instead. For example:

```Ty A = @llvm.experimental.vector.interleave.Ty(Vec, shape)
Ty B = @llvm.experimental.vector.deinterleave.Ty(Vec, shape)```

Here the intrinsics are simply vector in vector out with all input lanes existing in the output just at a different location (this is what I mean by a total shuffle). If only part of the result is important to the caller then they'll just extract the part they need. Here shape essentially refers to the number of subvectors that are logically contained within Vec(interleave) or B(deinterleave) and for this initial implementation we'd restrict support to just the value 2. The main usage rule is Ty.getKnownMinElementCount() must be devisable by shape.

Do you see any issues here? My thinking is that it becomes trivial to see how we'd support other strides (i.e. we'd just extend the verifier to allow shape=new-stride).

Interleaving requires two input vectors, so in order to do an interleave operation with the single-operand/result intrinsic, two input vectors will need to be concatenated (in IR) using llvm.vector.insert operations. There may be issues when these operations are hoisted to a different block, in the sense that the EXTRACT_SUBVECTOR nodes inserted as part of lowering to the ISD nodes are not folded away and result in actual code.

If we implement a 'total shuffle' with multiple operands/results (resulting in both lo/hi for interleave and even/odd for deinterleave), then we avoid having to insert artificial vector.insert/extract operations and it also becomes more clear which part of the calculation is used. Individual results are stored in separate virtual registers which I suspect may make it easier for the compiler to know what parts to deadcode, especially when the 'concat' of the results would otherwise lead to actual instructions (e.g. for SVE concat(<vscale x 2 x i32>,<vscale x 2 x i32>) -> <vscale x 4 x i32>).

Do you see any specific advantages to having a single vector operand/result?

paulwalker-arm added a comment.EditedFeb 1 2023, 4:45 AM

Hi @CarolineConcatto & @sdesmalen, sorry for the delay in responding (I had a phabricator free week :) ). The above all sounds sensible to me. In a way I'm playing devils advocate based on the original responses because the one advantage of having single input single output intrinsics is that it allows one intrinsic to be used (or rather extended) to allow for future shapes. It's clear from your responses, that whilst possible, such a design is likely impractical. To me this means we can drop all pretence of worrying about other shapes because we're concluding each shape is best handled via a dedicated intrinsic (and presumable ISD nodes). This is great news because it keeps things simple. However, we should avoid strictly modelling the intrinsics on how we expect AArch64 code generation to look and so I still prefer the total shuffle representation, which by the sounds of it you are both happy with?

I believe that leaves us with:

```{<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.experimental.vector.deinterleave2(<vscale x 4 x i64>)
<vscale x 4 x i64> @llvm.experimental.vector.interleave2(<vscale x 2 x i64>, <vscale x 2 x i64>)```

Which also quite nicely fits with how LoopVectorize works whereby it typically wants "big_load->deinterleave" and "interleave->big_store" idioms.

The ISD nodes presumably follow a similar pattern albeit with the double width vectors most likely represented as multiple vectors due to the way legalisation works. Here I think the ISD nodes will still benefit by allowing an arbitrary number of vector inputs and outputs but if we're locking the ISD nodes to a specific shape then I'm less concerned about that because the nodes can be extended after this initial implementation if need be.

Allen added a subscriber: Allen.Feb 2 2023, 1:35 AM
CarolineConcatto edited the summary of this revision. (Show Details)

Address review comments.

• Change the deinterleave intrinsic and ISD Node to return odd and even
• Add to the intrinsic names the stride 2:
```	     experimental.vector.deinterleave  to experimental.vector.deinterleave2
experimental.vector.interleave  to experimental.vector.interleave2```
paulwalker-arm added inline comments.Feb 5 2023, 5:04 AM
llvm/include/llvm/IR/Intrinsics.td
2125–2133

These intrinsic definitions don't look to match the langref description or tests.

The text says:

`declare <4 x double> @llvm.experimental.vector.interleave2.v2f64(<2 x double> %vec1, <2 x double> %vec2)`

making v2f64 the overloaded type from which the others (i.e. the <4 x double> return type) are derived. However, int_experimental_vector_interleave2 is defined such that the return type is the overloaded type. You can see this if you run your tests through opt. For example, interleave2_nxv2f16 becomes:

```define <vscale x 4 x half> @interleave2_nxv2f16(<vscale x 2 x half> %vec0, <vscale x 2 x half> %vec1) #0 {
%retval = call <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half> %vec0, <vscale x 2 x half> %vec1)
ret <vscale x 4 x half> %retval
}```

For what it's worth I think the text is correct because overloading the smaller type will look more consistent when adding other shapes. Which is to say I have a slight preference for:

```<vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv2f16(<vscale x 2 x half>...
<vscale x 6 x half> @llvm.experimental.vector.interleave3.nxv2f16(<vscale x 2 x half>...```

That will mean adding LLVMDoubleElementsVectorType though, unless there's already a class that'll do what we need.

The overloaded type size in the function for the examples in LangRef and in the test files were with the incorrect. The overloaded type size is showing as the smallest type size in the function. Problem is that there is only
LLVMHalfElementsVectorType to describe the dependencies between the type sizes.
So the overloaded type size needs to be the biggest type size in the function.

llvm/include/llvm/IR/Intrinsics.td
2125–2133

I believe we want to keep like this and not create a LLVMDoubleElementsVectorType. AFAIU we should use what we already have available in LLVM. More over if we want more strides in the future we may not even have to use LLVMHalfElementsVectorType.
But again, if you have a good argument about the use of LLVMDoubleElementsVectorType I can change. ATM I only update the test and the examples to use the biggest size type as overloaded.

Just want to say explicitly that I'm fine with the direction this has evolved, and that once review completes, I have no issue with this landing.

craig.topper added inline comments.Feb 9 2023, 12:07 AM
llvm/docs/LangRef.rst
17722

work -> works

17757

work -> works

paulwalker-arm added a comment.EditedFeb 9 2023, 8:55 AM

Hi @CarolineConcatto, it's fair to say most of my comments here are likely petty and relate more to the documentation side of things where I'd rather be more explicit in describing the operations being added. That said, they're all just suggestions for you to decide what to do with, if anything. That just leaves the way OutVT is calculated and the use of getVTList() as changes I more strongly encourage. Otherwise I think the patch looks great.

llvm/docs/LangRef.rst
17719

"constructs"

I've offered an alternate description of the ISD nodes that might be worth adapting for the intrinsic text if you think there's value.

17722–17723

Perhaps just "While this intrinsic supports all vector types the recommended...."?

17736

"The argument is a vector whose type corresponds to the logical concatenation of the two result types."?

17757

As above.

17770–17771

"Both arguments must be vectors of the same type whereby their logical concatenation matches the result type."?

llvm/include/llvm/CodeGen/ISDOpcodes.h
574–577

It's worth being explicit as to what "from VEC1 and VEC2" means. Perhaps:

`Returns two vectors with all input and output vectors having the same type. The first output contains the even indices from CONCAT_VECTORS(VEC1, VEC2), with the second output containing the odd indices. The relative order of elements within an output match that of the concatenated input.`
580–584

Similar to the above, perhaps:

`Returns two vectors with all input and output vectors having the same type. The first output contains the result of interleaving the low half of CONCAT_VECTORS(VEC1, VEC2), with the second output containing the result of interleaving the high half.`
llvm/include/llvm/IR/Intrinsics.td
2125–2133

You'll need to extend the overloaded type support anyway when adding support for other strides so I don't really by the argument. That said, the type to overload on is subjective so if you prefer to use the bigger type then sure I can live with that.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11565–11566

Given how the intrinsic is defined, what do you think of using EVT OutVT = InVec.getValueType().getHalfNumVectorElementsVT(); rather than the more expensive looking call to ComputeValueVTs.

11576

"Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing legalisation and combines."

11579

It's better to be specific, so OutVTs[0].isFixedLengthVector().

11598

Not that bothered but does the array gives us anything useful over the more typical:

```SDValue InVec1 = getValue(I.getOperand(0));
SDValue InVec2 = getValue(I.getOperand(1));```
11602–11604

Similar typos as mentioned in visitVectorDeinterleave.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24070

I think Expected scalable vector better reflects the assert.

24075

I think Op->getVTList() should work here?

24084

Same comment as with LowerVECTOR_DEINTERLEAVE.

24090

Same comment as with LowerVECTOR_DEINTERLEAVE.

llvm/test/CodeGen/AArch64/fixed-vector-deinterleave.ll
124–126

I don't think this test offers any value. It's really showing how VECTOR_SHUFFLE is legalised, which this patch doesn't care about.

llvm/test/CodeGen/AArch64/fixed-vector-interleave.ll
122

As with the deinterleave case I don't think this is testing anything the patch really cares about and my worry it'll just cause unnecessary pain for unrelated patches. If we plan to significantly improve the code generation then fine but if not then perhaps they're best removed?

llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll
13

Please can you indent the IR here and for the other functions in this file.style

luke added a subscriber: luke.Feb 9 2023, 10:38 AM
luke added inline comments.Feb 10 2023, 6:08 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24075

Could you use DAG.getMergeValues here too?

CarolineConcatto marked 7 inline comments as done.
• Address review comments
• Indent sve tests
CarolineConcatto marked 4 inline comments as done.
• Remove .patch file
CarolineConcatto marked 10 inline comments as done.
• Remove unwanted changes in LangRef for vector-reverse

Thank you all for the suggestion.
I believe I have addressed all of them.
Carol

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11565–11566

Thank you, I was not aware of this function.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24075

Thank you Luke, also missed that function when looking at MERGE.

Fix LangRef for deinterleave and interleave arguments.
Fix extra space in sve-vector-deinterleave.ll

paulwalker-arm accepted this revision.Feb 13 2023, 11:01 AM

A couple of minor requests but otherwise the patch looks good.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11571

It's better to use getVectorIdxConstant here.

11573

As above.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1231–1232

No change required, the following is just an observational comment.

I just wanted to highlight there are no tests for the MVT::nxv1i1 cases, which I'm guessing triggers an isel failure as there's as yet no support for the zip and uzp nodes for this type? Given legalisation for these nodes will follow this patch I'm assuming there isn't a route that is crash free today.

llvm/test/CodeGen/AArch64/fixed-vector-deinterleave.ll
44

Can you move this function before vector_deinterleave_v2f32_v4f32 to keep the related element types together.

llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll
41

Can you move this function before vector_deinterleave_nxv2f32_nxv4f32 to keep the related element types together.

This revision is now accepted and ready to land.Feb 13 2023, 11:01 AM
luke added inline comments.Feb 15 2023, 3:51 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11571

Seconded, I'm working on a patch to lower these intrinsics to RISC-V and it currently throws an assertion here on RV32 as it doesn't support MVT::i64 constants.

CarolineConcatto marked 2 inline comments as done.Feb 15 2023, 5:21 AM
CarolineConcatto added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11571

Thank you Paul,
I did not know about this function.

11571

@luke Does it solved your problem if we replace by getVectorIdxConstant like suggest by Paul?

luke added inline comments.Feb 15 2023, 5:32 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11571

Yeah, I tried it out locally and it fixes it!

• Use in VisitVectorDeinterleave getVectorIdxConstant to the Index in VECTOR_EXTRACT
• Address nit in yhe test files
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11571

Wonderfull, I will update the patch!

This revision was landed with ongoing or failed builds.Feb 20 2023, 4:44 AM
This revision was automatically updated to reflect the committed changes.