This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
7/7
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
3/5
ISDOpcodes.h
-
IR/
1/3
Intrinsics.td
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
SelectionDAGBuilder.h
16/19
SelectionDAGBuilder.cpp
-
SelectionDAGDumper.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.h
5/7
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/2
fixed-vector-deinterleave.ll
1/1
fixed-vector-interleave.ll
1/2
sve-vector-deinterleave.ll
-
sve-vector-interleave.ll

Differential D141924

[IR] Add new intrinsics interleave and deinterleave vectors
ClosedPublic

Authored by CarolineConcatto on Jan 17 2023, 5:41 AM.

Download Raw Diff

Details

Reviewers

craig.topper
fhahn
paulwalker-arm
efriedma
reames
sdesmalen
mgabka

Commits

rGd515ecca6834: [IR] Add new intrinsics interleave and deinterleave vectors

Summary

This patch adds 2 new intrinsics:

; Interleave two vectors into a wider vector
<vscale x 4 x i64> @llvm.vector.interleave2.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd)

; Deinterleave the odd and even lanes from a wider vector
{<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv2i64(<vscale x 4 x i64> %vec)

The main motivator for adding these intrinsics is to support vectorization of
complex types using scalable vectors.

The intrinsics are kept simple by only supporting a stride of 2, which makes
them easy to lower and type-legalize. A stride of 2 is sufficient to handle
complex types which only have a real/imaginary component.

The format of the intrinsics matches how shufflevector is used in
LoopVectorize. For example:

using cf = std::complex<float>;

void foo(cf * dst, int N) {
    for (int i=0; i<N; ++i)
        dst[i] += cf(1.f, 2.f);
}

For this loop, LoopVectorize:

(1) Loads a wide vector (e.g. <8 x float>)
(2) Extracts odd lanes using shufflevector (leading to <4 x float>)
(3) Extracts even lanes using shufflevector (leading to <4 x float>)
(4) Performs the addition
(5) Interleaves the two <4 x float> vectors into a single <8 x float> using
    shufflevector
(6) Stores the wide vector.

In this example, we can 1-1 replace shufflevector in (2) and (3) with the
deinterleave intrinsic, and replace the shufflevector in (5) with the
interleave intrinsic.

The SelectionDAG nodes might be extended to support higher strides (3, 4, etc)
as well in the future.

Similar to what was done for vector.splice and vector.reverse, the intrinsic
is lowered to a shufflevector when the type is fixed width, so to benefit from
existing code that was written to recognize/optimize shufflevector patterns.

Note that this approach does not prevent us from adding new intrinsics for other
strides, or adding a more generic shuffle intrinsic in the future. It just solves
the immediate problem of being able to vectorize loops with complex math.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

CarolineConcatto created this revision.Jan 17 2023, 5:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 5:41 AM

Herald added subscribers: jdoerfert, hiraditya. · View Herald Transcript

CarolineConcatto requested review of this revision.Jan 17 2023, 5:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 5:41 AM

Herald added subscribers: llvm-commits, • pcwang-thead, alextsao1999. · View Herald Transcript

CarolineConcatto added reviewers: craig.topper, fhahn, paulwalker-arm, efriedma, reames, sdesmalen, mgabka.Jan 17 2023, 5:48 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 17 2023, 5:48 AM

Just to add a bit of context here: The main motivator for adding these intrinsics is so that we can vectorise loops that have complex math using scalable vectors in a way that is simple to hook into the LoopVectorizer.

I'm aware there have been discussions about more generic representations of shuffles. These intrinsics are orthogonal to those discussions. They aim to solve a particular and current problem (vectorizing complex math) and follow the design principle that was taken for vector.insert/extract/reverse/splice, i.e. to have specific intrinsics for different shuffle patterns. We've done a fair bit of experimentation with different kinds of de/interleaving intrinsics. Some of the benefits we found with this proposed form is that:

They can be trivially combined with loads/stores to get struct load/store instructions (e.g. LD2/ST2).
They're simple to lower for both SVE and RVV, which have instructions to efficiently do pair-wise interleaving/deinterleaving.
They're reasonably simple to type-legalize.
They should be easy to hook into the LoopVectorizer.

Harbormaster completed remote builds in B208222: Diff 489784.Jan 17 2023, 7:17 AM

I plan on doing a real review in a moment, but let me start with a high level comment.

I am one of the folks who thinks long term that we need a generalized means for describing shuffles with runtime masks. I had started a few months ago to put together a proposal for the same, but it stalled due to lack of attention. Despite this long term direction, I want to explicitly say that I do *not* oppose the addition of intrinsics in the near term to solve the same problem. We need to unblock progress here, and we can migrate at a later time to a more general solution if one comes along.

mgabka added a subscriber: igor.kirillov.Jan 17 2023, 8:42 AM

First, there was another take on this done in https://reviews.llvm.org/D134438. That approach tried to introduce interleaving stores and deinterleaving loads, whereas this one separates interleave into it's own set of intrinsics. I think I like this approach better if there if can be made to work cleanly.

Second, I think we can generalize your intrinsics to handle general interleave and deinterleave without much effort.

For the interleave case, we can simply allow an arbitrary number of vector arguments with matching vector types. If the input type is <vscale x N x ty> than the result is <vscale x A*N x ty> where A is the (compile time constant) number of arguments. We could even land the current definition and do this generalization in a follow on if desired.

For the deinterleave case, it's a bit trickier. I'd like to avoid specific odd/even versions. One option I see is to add two integer constant arguments to the intrinsic. The first would be the stride, the second would be the remainder. So, your "even" variant becomes deinterleave(vec, 2, 0). One piece that I'm not sure works here is that our result type needs to be a function of the type of the vector argument and the first integer argument. That may require some custom verification rules.

Given these are in the experimental namespace, I'd could probably be convinced that we should land these and then iterate.

llvm/include/llvm/CodeGen/ISDOpcodes.h
574	The choice here to represent the longer vector type as two vectors which are implicitly concatenated is interesting. Can you explain why you made that choice? Is it important for legalization?
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11573	Is this actually the semantic of a extract_vector on scalable vectors? Given e.g. 2, I'd expect to get a vector starting at the 3rd element, not split at the high half.
11580	This should be createStrideMask in VectorUtils.h right?
11602	Comment says deinterleave, fix.
11605	createInterleaveMask - though, I think you chose a different strategy using the concat_vector here. Make sure you keep it consistent if you change one part.

reames added inline comments.Jan 17 2023, 8:54 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11605	ignore me on the second part of the comment here; I'd misread. Reusing createInterleaveMask would still be good.

CarolineConcatto updated this revision to Diff 490097.Jan 18 2023, 3:15 AM

Use createInterleaveMask and createStrideMask for fixed vector.

llvm/include/llvm/CodeGen/ISDOpcodes.h
574	It is more complicated when we have different sizes of input and output to legalise, keeping all inputs and outputs in the same size makes legalisation simpler.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11573	I don't know if I understand what you are writing about. But just in case this is my explanation: The ISD Node for DEINTERLEAVE, needs the input vector divided by two. Splitting the input vector in half makes all operands have the same size(inputs and outputs). This makes the legalisation work less complicated. For the deinterleave the output is always half the size of the input def int_experimental_vector_deinterleave_even : DefaultAttrsIntrinsic<[LLVMHalfElementsVectorType<0>], [llvm_anyvector_ty], So if we split the input in half they should all have the same size when using ISD Node. As far as I understand idx is a multiple of the known minimum vector length. For instance: <nxv2i64> extract_vector<nxv4i64>, the index can be only 0 or 2 <nxv4i32> extract_vector<nxv8i32>, the index can only be 0 or 4 So I believe the index is correct if we want to split the input vector in half. /// EXTRACT_SUBVECTOR(VECTOR, IDX) - Returns a subvector from VECTOR. /// Let the result type be T, then IDX represents the starting element number /// from which a subvector of type T is extracted. IDX must be a constant /// multiple of T's known minimum vector length.

Harbormaster completed remote builds in B208451: Diff 490097.Jan 18 2023, 4:29 AM

Thanks for your feedback @reames

We could even land the current definition and do this generalization in a follow on if desired.

Great! That was kind of our reasoning with doing just a stride of 2, keep it simple at first and if there is need for it we can extend it to other strides as well.

For the interleave case, we can simply allow an arbitrary number of vector arguments with matching vector types. If the input type is <vscale x N x ty> than the result is <vscale x A*N x ty> where A is the (compile time constant) number of arguments.

For the deinterleave case, it's a bit trickier. I'd like to avoid specific odd/even versions. One option I see is to add two integer constant arguments to the intrinsic.

That is one of the things we experimented with and a reason to design the ISD nodes in this way, as they're easily extended for higher strides.
Legalisation for non-power-of-2 strides (like 3 or 5) gets a bit awkward though as it requires lots of insert/extract_subvector operations, some of them SVE does not yet support (we need to put in a bit more work to support nxv1* types), but we could have a cost-model to avoid choosing such strides in the LV.

One thing to keep in mind is that all targets must be able to lower these intrinsics even if they don't have dedicated instructions for such interleaves (e.g. when they can't be merged with load/store instructions). I think we can probably fall back to using gather/scatter to implement lowering of these operations for any stride.

The first would be the stride, the second would be the remainder. So, your "even" variant becomes deinterleave(vec, 2, 0). One piece that I'm not sure works here is that our result type needs to be a function of the type of the vector argument and the first integer argument. That may require some custom verification rules.

We experimented with just passing the 'offset'. The stride could be deduced from the types (e.g. if output is <vscale x 2 x i64> and input is <vscale x 8 x i64>, then the stride is 4), and the offset would tell at what element to start deinterleaving (it would be a value 0 <= offset < stride).

llvm/include/llvm/CodeGen/ISDOpcodes.h
574	That's right, legalisation becomes simpler when all the types are the same. For example, when the input vector is illegal (too wide) but the vector output is legal, we'd need to split the operation into two. // Assuming that nxv4i32 is legal, and nxv8i32 needs splitting nxv4i32 deinterleave(nxv8i32, 0) -> // Now the input vector is legal, but the output type of deinterleave (nxv2i32) // is illegal and the operation needs further promotion or widening nxv4i32 concat(nxv2i32 deinterleave(nxv4i32 extract_lo(nxv8i32), 0), nxv2i32 deinterleave(nxv4i32 extract_hi(nxv8i32), 0)) Whereas, if we split the vector such that we have `nxv4i32 deinterleave(nxv4i32, nxv4i32, 0)`, then all the types are legal (or illegal, when using a different example) at the same time.

Matt added a subscriber: Matt.Jan 18 2023, 2:07 PM

Hi @CarolineConcatto & @sdesmalen, I've a couple of simplifications for you to consider which I believe makes things easier to work with. Perhaps simplifications is the wrong word but I do think they'll introduce more uniformity to the design.

Intrinsics:
What about implementing total shuffles and breaking the dependence on vector types having any meaning, with this encoded as a discrete immediate operand instead. For example:

Ty A = @llvm.experimental.vector.interleave.Ty(Vec, shape)
Ty B = @llvm.experimental.vector.deinterleave.Ty(Vec, shape)

Here the intrinsics are simply vector in vector out with all input lanes existing in the output just at a different location (this is what I mean by a total shuffle). If only part of the result is important to the caller then they'll just extract the part they need. Here shape essentially refers to the number of subvectors that are logically contained within Vec(interleave) or B(deinterleave) and for this initial implementation we'd restrict support to just the value 2. The main usage rule is Ty.getKnownMinElementCount() must be devisable by shape.

Do you see any issues here? My thinking is that it becomes trivial to see how we'd support other strides (i.e. we'd just extend the verifier to allow shape=new-stride).

CodeGen:
As you know the vector types here are critical because we must be able to legalise all supported variants. I prefer how you've defined ISD::VECTOR_INTERLEAVE over ISD::VECTOR_DEINTERLEAVE because it represents a total shuffle.

However, my proposal is to match the above intrinsic interface but replace the single vector in/out rule with one that dictates the ISD nodes must have a matching number of vector inputs and outputs with all having the same type. The shape operand remains as is and the operations are defined to first concatenate all N vector operands, perform the necessary shuffle (based on the shape), before the result is then evenly split into N vectors.

The important thing here is that shape does not dictate anything about the number of vectors. Once all type legalisation is in place I'd expect a simple mapping from intrinsic to ISD node. After type legalisation the vector counts might change but shape does not. I think the shape gives enough information to guide type/operation legalisation in the best order to split, promote or widen the vectors?

I don't think this deviates much from your current design but does provide more extensibility. Given this patch is not worrying about type legalisation the biggest change is likely to be to the operation descriptions, but I'd appreciate a little tire kicking to see if I'm misrepresenting the benefits. What do you think?

Paul,
Let me know if I understood correctly.
You are suggesting that:

Intrinsic have only 1 vec input and 1 vec output.

<vscale x 4 x i64> @llvm.experimental.vector.deinterleave.nxv4i64(<vscale x 4 x i64> %vec, i64 2)
<vscale x 4 x i64> @llvm.experimental.vector.interleave.nxv4i64(<vscale x 4 x i64> %vec, i64 2)

ISD node have shapes in and out vectors. For instance, if shape is 2:

RES0, RES1 = DEINTERLEAVE (VEC0, VEC1)
RES0, RES1 = INTERLEAVE (VEC0, VEC1)

and all the vector sizes(input and output) are equal

If I understood it correctly, these are my 2 cents:
A)This solution is more configurable than the one we are proposing. So the intrinsic verifier needs to have rules to avoid mistakes like:

A.1) Shape not proportional to the minimum number of elements(As you wrote:

Ty.getKnownMinElementCount() must be devisable by shape

For instance:
It should not be possible to do:

<vscale x 7 x i64> @llvm.experimental.vector.deinterleave.nxv7i64(<vscale x 7 x i64> %vec, i64 2)

At the moment the intrinsic is checked by tablegen(TargetSelectionDAG), but as you proposed we need to add code for it in Verifier.cpp. I believe I cannot use tablegen to make sure that shape is proportional to the minimum number of elements

A.2) Shapes that are not supported. Like:

<vscale x 12 x i64> @llvm.experimental.vector.interleave.nxv12i64(<vscale x 12 x i64> %vec, i64 4)

<vscale x 6 x i64> @llvm.experimental.vector.interleave.nxv7i64(<vscale x 6 x i64> %vec, i64 3)

B) For DEINTERLEAVE we can mix even and odd vectors when extracting without being very clear.
(If I could give my suggestion too, we should not return deinterleave as a concatenated vector, but as a struct of N vectors(2 in this case))
The reason is that:
Imagine we want to deinterleave a vector like <v6i64><i64 0, i641, i64 2, i64 3, i64 4, i64 5>, the result stored in <v6i64>Res and with a shape equal to 2. It should also split the vector into even and odds elements.
So the llvm-IR would be something like:

Res = <v6i64>@llvm.experimental.vector.deinterleave.v6i64(<v6i64><i64 0, i641, i64 2, i64 3, i64 4, i64 5>, i64 2)
Even= <v2i64> @llvm.extract.vector.v2i64(<v6i64> %Res, i64 0)
Odd = <v4i64> @llvm.extract.vector.v4i64(<v6i64> %Res, i64 2)

The return vector is:
Res =<v6i64><i64 0, i64 2, i64 4, i64 1, i64 3, i64 5>
and Even and Odd are:
Even= <v2i64><i64 0, i64 2>
Odd =<v4i64><i64 4,i64 1, i64 3, i64 5>
We still have 2 vectors, but wrongly split.

The same would not happen if the deinterleave returns a struct with even and odd vectors. We would need to do something like this to have even and odd:

Res = {<v3i64>, <v3i64>} @llvm.experimental.vector.deinterleave.v3i64(<v6i64><i64 0, i641, i64 2, i64 3, i64 4, i64 5>, i64 2)
Even= <3i64> @llvm.extract.element.v3i64({<v3i64>, <3i64> }%Res, i64 0)
Odd = <3i64> @llvm.extract.element.v3i64({<v3i64>, <v3i64>} %Res, i64 1)

Res = {<v3i64><i64 0, i64 2, i64 4>,<v3i64><i64 1, i64 3, i645>}
and Even and Odd are:
Even= <v3i64><i64 0, i64 2, i64 4>
Odd =<v3i64><i64 1, i64 3, i64 5>

I believe that the following is not possible, without having to concatenate after the intrinsic call. Which, IMHO, makes it clear the intention/goal.

Even = <v2i64> @llvm.extract.vector.v2i64(<v3i64> %Res, i64 0)
Odd = <v4i64> @llvm.extract.vector.v4i64(<v3i64> %Res, i64 2)

And if we want to be similar to a fixed vector. The mask for deinterleave only returns 1 vector, which could be odd or even and not all concatenated.

If all vectors in the ISDNode are the same size I don't foresee any problem with the legalization. But if they are different, then it becomes complicated, as explained before.

To put it in a nutshell, I would say:
I am fine updating as you suggested, I do not foresee many problems. But I would like us to agree on one suggestion/solution before doing more significant changes.

In D141924#4075634, @paulwalker-arm wrote:
What about implementing total shuffles and breaking the dependence on vector types having any meaning, with this encoded as a discrete immediate operand instead. For example:
Ty A = @llvm.experimental.vector.interleave.Ty(Vec, shape)
Ty B = @llvm.experimental.vector.deinterleave.Ty(Vec, shape)
Here the intrinsics are simply vector in vector out with all input lanes existing in the output just at a different location (this is what I mean by a total shuffle). If only part of the result is important to the caller then they'll just extract the part they need. Here shape essentially refers to the number of subvectors that are logically contained within Vec(interleave) or B(deinterleave) and for this initial implementation we'd restrict support to just the value 2. The main usage rule is Ty.getKnownMinElementCount() must be devisable by shape.

Do you see any issues here? My thinking is that it becomes trivial to see how we'd support other strides (i.e. we'd just extend the verifier to allow shape=new-stride).

Interleaving requires two input vectors, so in order to do an interleave operation with the single-operand/result intrinsic, two input vectors will need to be concatenated (in IR) using llvm.vector.insert operations. There may be issues when these operations are hoisted to a different block, in the sense that the EXTRACT_SUBVECTOR nodes inserted as part of lowering to the ISD nodes are not folded away and result in actual code.

If we implement a 'total shuffle' with multiple operands/results (resulting in both lo/hi for interleave and even/odd for deinterleave), then we avoid having to insert artificial vector.insert/extract operations and it also becomes more clear which part of the calculation is used. Individual results are stored in separate virtual registers which I suspect may make it easier for the compiler to know what parts to deadcode, especially when the 'concat' of the results would otherwise lead to actual instructions (e.g. for SVE concat(<vscale x 2 x i32>,<vscale x 2 x i32>) -> <vscale x 4 x i32>).

Do you see any specific advantages to having a single vector operand/result?

rscottmanley added a subscriber: rscottmanley.Jan 30 2023, 11:29 AM

Hi @CarolineConcatto & @sdesmalen, sorry for the delay in responding (I had a phabricator free week :) ). The above all sounds sensible to me. In a way I'm playing devils advocate based on the original responses because the one advantage of having single input single output intrinsics is that it allows one intrinsic to be used (or rather extended) to allow for future shapes. It's clear from your responses, that whilst possible, such a design is likely impractical. To me this means we can drop all pretence of worrying about other shapes because we're concluding each shape is best handled via a dedicated intrinsic (and presumable ISD nodes). This is great news because it keeps things simple. However, we should avoid strictly modelling the intrinsics on how we expect AArch64 code generation to look and so I still prefer the total shuffle representation, which by the sounds of it you are both happy with?

I believe that leaves us with:

{<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.experimental.vector.deinterleave2(<vscale x 4 x i64>)
<vscale x 4 x i64> @llvm.experimental.vector.interleave2(<vscale x 2 x i64>, <vscale x 2 x i64>)

Which also quite nicely fits with how LoopVectorize works whereby it typically wants "big_load->deinterleave" and "interleave->big_store" idioms.

The ISD nodes presumably follow a similar pattern albeit with the double width vectors most likely represented as multiple vectors due to the way legalisation works. Here I think the ISD nodes will still benefit by allowing an arbitrary number of vector inputs and outputs but if we're locking the ISD nodes to a specific shape then I'm less concerned about that because the nodes can be extended after this initial implementation if need be.

Allen added a subscriber: Allen.Feb 2 2023, 1:35 AM

Address review comments.

Change the deinterleave intrinsic and ISD Node to return odd and even
Add to the intrinsic names the stride 2:

	     experimental.vector.deinterleave  to experimental.vector.deinterleave2
	     experimental.vector.interleave  to experimental.vector.interleave2

Harbormaster completed remote builds in B211701: Diff 494598.Feb 3 2023, 7:15 AM

paulwalker-arm added inline comments.Feb 5 2023, 5:04 AM

llvm/include/llvm/IR/Intrinsics.td
2125–2133	These intrinsic definitions don't look to match the langref description or tests. The text says: declare <4 x double> @llvm.experimental.vector.interleave2.v2f64(<2 x double> %vec1, <2 x double> %vec2) making `v2f64` the overloaded type from which the others (i.e. the `<4 x double>` return type) are derived. However, `int_experimental_vector_interleave2` is defined such that the return type is the overloaded type. You can see this if you run your tests through `opt`. For example, interleave2_nxv2f16 becomes: define <vscale x 4 x half> @interleave2_nxv2f16(<vscale x 2 x half> %vec0, <vscale x 2 x half> %vec1) #0 { %retval = call <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half> %vec0, <vscale x 2 x half> %vec1) ret <vscale x 4 x half> %retval } For what it's worth I think the text is correct because overloading the smaller type will look more consistent when adding other shapes. Which is to say I have a slight preference for: <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv2f16(<vscale x 2 x half>... <vscale x 6 x half> @llvm.experimental.vector.interleave3.nxv2f16(<vscale x 2 x half>... That will mean adding `LLVMDoubleElementsVectorType` though, unless there's already a class that'll do what we need.

The overloaded type size in the function for the examples in LangRef and in the test files were with the incorrect. The overloaded type size is showing as the smallest type size in the function. Problem is that there is only
LLVMHalfElementsVectorType to describe the dependencies between the type sizes.
So the overloaded type size needs to be the biggest type size in the function.

CarolineConcatto added inline comments.Feb 7 2023, 1:30 AM

llvm/include/llvm/IR/Intrinsics.td
2125–2133	I believe we want to keep like this and not create a LLVMDoubleElementsVectorType. AFAIU we should use what we already have available in LLVM. More over if we want more strides in the future we may not even have to use LLVMHalfElementsVectorType. But again, if you have a good argument about the use of LLVMDoubleElementsVectorType I can change. ATM I only update the test and the examples to use the biggest size type as overloaded.

Harbormaster completed remote builds in B212310: Diff 495421.Feb 7 2023, 3:34 AM

Just want to say explicitly that I'm fine with the direction this has evolved, and that once review completes, I have no issue with this landing.

craig.topper added inline comments.Feb 9 2023, 12:07 AM

llvm/docs/LangRef.rst
17722	work -> works
17757	work -> works

Hi @CarolineConcatto, it's fair to say most of my comments here are likely petty and relate more to the documentation side of things where I'd rather be more explicit in describing the operations being added. That said, they're all just suggestions for you to decide what to do with, if anything. That just leaves the way OutVT is calculated and the use of getVTList() as changes I more strongly encourage. Otherwise I think the patch looks great.

llvm/docs/LangRef.rst
17719	"constructs" I've offered an alternate description of the ISD nodes that might be worth adapting for the intrinsic text if you think there's value.
17722–17723	Perhaps just "While this intrinsic supports all vector types the recommended...."?
17736	"The argument is a vector whose type corresponds to the logical concatenation of the two result types."?
17757	As above.
17770–17771	"Both arguments must be vectors of the same type whereby their logical concatenation matches the result type."?
llvm/include/llvm/CodeGen/ISDOpcodes.h
574–577	It's worth being explicit as to what "from VEC1 and VEC2" means. Perhaps: Returns two vectors with all input and output vectors having the same type. The first output contains the even indices from CONCAT_VECTORS(VEC1, VEC2), with the second output containing the odd indices. The relative order of elements within an output match that of the concatenated input.
580–584	Similar to the above, perhaps: Returns two vectors with all input and output vectors having the same type. The first output contains the result of interleaving the low half of CONCAT_VECTORS(VEC1, VEC2), with the second output containing the result of interleaving the high half.
llvm/include/llvm/IR/Intrinsics.td
2125–2133	You'll need to extend the overloaded type support anyway when adding support for other strides so I don't really by the argument. That said, the type to overload on is subjective so if you prefer to use the bigger type then sure I can live with that.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11565–11566	Given how the intrinsic is defined, what do you think of using `EVT OutVT = InVec.getValueType().getHalfNumVectorElementsVT();` rather than the more expensive looking call to ComputeValueVTs.
11576	"Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing legalisation and combines."
11579	It's better to be specific, so `OutVTs[0].isFixedLengthVector()`.
11598	Not that bothered but does the array gives us anything useful over the more typical: SDValue InVec1 = getValue(I.getOperand(0)); SDValue InVec2 = getValue(I.getOperand(1));
11602–11604	Similar typos as mentioned in visitVectorDeinterleave.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24070	I think `Expected scalable vector` better reflects the assert.
24075	I think `Op->getVTList()` should work here?
24084	Same comment as with LowerVECTOR_DEINTERLEAVE.
24090	Same comment as with LowerVECTOR_DEINTERLEAVE.
llvm/test/CodeGen/AArch64/fixed-vector-deinterleave.ll
124–126	I don't think this test offers any value. It's really showing how `VECTOR_SHUFFLE` is legalised, which this patch doesn't care about.
llvm/test/CodeGen/AArch64/fixed-vector-interleave.ll
122	As with the deinterleave case I don't think this is testing anything the patch really cares about and my worry it'll just cause unnecessary pain for unrelated patches. If we plan to significantly improve the code generation then fine but if not then perhaps they're best removed?
llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll
13	Please can you indent the IR here and for the other functions in this file.style

luke added a subscriber: luke.Feb 9 2023, 10:38 AM

luke added inline comments.Feb 10 2023, 6:08 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24075	Could you use `DAG.getMergeValues` here too?

Address review comments

Indent sve tests

Remove .patch file

Remove unwanted changes in LangRef for vector-reverse

Thank you all for the suggestion.
I believe I have addressed all of them.
Carol

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11565–11566	Thank you, I was not aware of this function.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24075	Thank you Luke, also missed that function when looking at MERGE.

Fix LangRef for deinterleave and interleave arguments.
Fix extra space in sve-vector-deinterleave.ll

Harbormaster completed remote builds in B213385: Diff 496911.Feb 13 2023, 5:10 AM

A couple of minor requests but otherwise the patch looks good.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11571	It's better to use `getVectorIdxConstant` here.
11573	As above.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1231–1232	No change required, the following is just an observational comment. I just wanted to highlight there are no tests for the `MVT::nxv1i1` cases, which I'm guessing triggers an isel failure as there's as yet no support for the zip and uzp nodes for this type? Given legalisation for these nodes will follow this patch I'm assuming there isn't a route that is crash free today.
llvm/test/CodeGen/AArch64/fixed-vector-deinterleave.ll
44	Can you move this function before `vector_deinterleave_v2f32_v4f32` to keep the related element types together.
llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll
41	Can you move this function before `vector_deinterleave_nxv2f32_nxv4f32` to keep the related element types together.

This revision is now accepted and ready to land.Feb 13 2023, 11:01 AM

luke added inline comments.Feb 15 2023, 3:51 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11571	Seconded, I'm working on a patch to lower these intrinsics to RISC-V and it currently throws an assertion here on RV32 as it doesn't support MVT::i64 constants.

luke mentioned this in D144092: [RISCV] Lower interleave and deinterleave intrinsics.Feb 15 2023, 4:06 AM

luke added a child revision: D144092: [RISCV] Lower interleave and deinterleave intrinsics.Feb 15 2023, 4:06 AM

CarolineConcatto marked 2 inline comments as done.Feb 15 2023, 5:21 AM

CarolineConcatto added inline comments.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11571	Thank you Paul, I did not know about this function.
11571	@luke Does it solved your problem if we replace by getVectorIdxConstant like suggest by Paul?

luke added inline comments.Feb 15 2023, 5:32 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11571	Yeah, I tried it out locally and it fixes it!

Use in VisitVectorDeinterleave getVectorIdxConstant to the Index in VECTOR_EXTRACT
Address nit in yhe test files

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
11571	Wonderfull, I will update the patch!

Harbormaster completed remote builds in B213888: Diff 497663.Feb 15 2023, 7:55 AM

craig.topper mentioned this in D144143: [RISCV] Improve isInterleaveShuffle to handle interleaving the high half and low half of the same source..Feb 15 2023, 3:22 PM

luke added a child revision: D144175: [RISCV] Combine (store/load interleave,deinterleave) into vsseg2/vlseg2.Feb 16 2023, 2:19 AM

luke removed a child revision: D144175: [RISCV] Combine (store/load interleave,deinterleave) into vsseg2/vlseg2.Feb 17 2023, 4:09 AM

craig.topper mentioned this in rG42944abf8583: [RISCV] Improve isInterleaveShuffle to handle interleaving the high half and….Feb 17 2023, 10:01 AM

This revision was landed with ongoing or failed builds.Feb 20 2023, 4:44 AM

Closed by commit rGd515ecca6834: [IR] Add new intrinsics interleave and deinterleave vectors (authored by CarolineConcatto, committed by sdesmalen). · Explain Why

This revision was automatically updated to reflect the committed changes.

sdesmalen added a commit: rGd515ecca6834: [IR] Add new intrinsics interleave and deinterleave vectors.

reames mentioned this in D134438: POC patch to demonstrate how new intrinsics for interleaved load/store could be used in LoopVectorize.Feb 23 2023, 7:36 AM

luke mentioned this in rG8d15e7275fe1: [RISCV] Lower interleave and deinterleave intrinsics.Feb 23 2023, 8:23 AM

mgabka mentioned this in D145163: Add support for vectorization of interleaved memory accesses for scalable VF.Mar 2 2023, 7:24 AM

paulwalker-arm mentioned this in D94444: [RFC][Scalable] Add scalable shuffle intrinsic to extract evens from a pair of vectors.Mar 6 2023, 1:46 PM

huntergr mentioned this in rG95bfb1902db9: [LV][AArch64] Allow (limited) interleaving for scalable vectors.Jun 9 2023, 3:43 AM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

69 lines

include/

llvm/

CodeGen/

ISDOpcodes.h

13 lines

IR/

Intrinsics.td

11 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.h

2 lines

SelectionDAGBuilder.cpp

65 lines

SelectionDAGDumper.cpp

2 lines

Target/

AArch64/

AArch64ISelLowering.h

2 lines

AArch64ISelLowering.cpp

38 lines

test/

CodeGen/

AArch64/

fixed-vector-deinterleave.ll

136 lines

fixed-vector-interleave.ll

133 lines

sve-vector-deinterleave.ll

186 lines

sve-vector-interleave.ll

181 lines

Diff 498800

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 17,695 Lines • ▼ Show 20 Lines
	recommended way to express reverse operations for fixed-width vectors is still			recommended way to express reverse operations for fixed-width vectors is still
	to use a shufflevector, as that may allow for more optimization opportunities.			to use a shufflevector, as that may allow for more optimization opportunities.

	Arguments:			Arguments:
	""""""""""			""""""""""

	The argument to this intrinsic must be a vector.			The argument to this intrinsic must be a vector.

				'``llvm.experimental.vector.deinterleave2``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""
				This is an overloaded intrinsic.

				::

				declare {<2 x double>, <2 x double>} @llvm.experimental.vector.deinterleave2.v4f64(<4 x double> %vec1)
				declare {<vscale x 4 x i32>, <vscale x 4 x i32>} @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> %vec1)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.deinterleave2``' intrinsic constructs two
				paulwalker-armUnsubmitted Done Reply Inline Actions "constructs" I've offered an alternate description of the ISD nodes that might be worth adapting for the intrinsic text if you think there's value. paulwalker-arm: "constructs" I've offered an alternate description of the ISD nodes that might be worth…
				vectors by deinterleaving the even and odd lanes of the input vector.

				This intrinsic works for both fixed and scalable vectors. While this intrinsic
				craig.topperUnsubmitted Done Reply Inline Actions work -> works craig.topper: work -> works
				supports all vector types the recommended way to express this operation for
				paulwalker-armUnsubmitted Done Reply Inline Actions Perhaps just "While this intrinsic supports all vector types the recommended...."? paulwalker-arm: Perhaps just "While this intrinsic supports all vector types the recommended...."?
				fixed-width vectors is still to use a shufflevector, as that may allow for more
				optimization opportunities.

				For example:

				.. code-block:: text

				{<2 x i64>, <2 x i64>} llvm.experimental.vector.deinterleave2.v4i64(<4 x i64> <i64 0, i64 1, i64 2, i64 3>); ==> {<2 x i64> <i64 0, i64 2>, <2 x i64> <i64 1, i64 3>}

				Arguments:
				""""""""""

				The argument is a vector whose type corresponds to the logical concatenation of
				paulwalker-armUnsubmitted Done Reply Inline Actions "The argument is a vector whose type corresponds to the logical concatenation of the two result types."? paulwalker-arm: "The argument is a vector whose type corresponds to the logical concatenation of the two result…
				the two result types.

				'``llvm.experimental.vector.interleave2``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""
				This is an overloaded intrinsic.

				::

				declare <4 x double> @llvm.experimental.vector.interleave2.v4f64(<2 x double> %vec1, <2 x double> %vec2)
				declare <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.interleave2``' intrinsic constructs a vector
				by interleaving two input vectors.

				This intrinsic works for both fixed and scalable vectors. While this intrinsic
				craig.topperUnsubmitted Done Reply Inline Actions work -> works craig.topper: work -> works
				paulwalker-armUnsubmitted Done Reply Inline Actions As above. paulwalker-arm: As above.
				supports all vector types the recommended way to express this operation for
				fixed-width vectors is still to use a shufflevector, as that may allow for more
				optimization opportunities.

				For example:

				.. code-block:: text

				<4 x i64> llvm.experimental.vector.interleave2.v4i64(<2 x i64> <i64 0, i64 2>, <2 x i64> <i64 1, i64 3>); ==> <4 x i64> <i64 0, i64 1, i64 2, i64 3>

				Arguments:
				""""""""""
				Both arguments must be vectors of the same type whereby their logical
				concatenation matches the result type.
				paulwalker-armUnsubmitted Done Reply Inline Actions "Both arguments must be vectors of the same type whereby their logical concatenation matches the result type."? paulwalker-arm: "Both arguments must be vectors of the same type whereby their logical concatenation matches…

	'``llvm.experimental.vector.splice``' Intrinsic			'``llvm.experimental.vector.splice``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:			Syntax:
	"""""""			"""""""
	This is an overloaded intrinsic.			This is an overloaded intrinsic.

	::			::
	▲ Show 20 Lines • Show All 9,016 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 565 Lines • ▼ Show 20 Lines	enum NodeType {
/// condition cannot be determined statically but is false at runtime, then		/// condition cannot be determined statically but is false at runtime, then
/// the result vector is undefined. The IDX parameter must be a vector index		/// the result vector is undefined. The IDX parameter must be a vector index
/// constant type, which for most targets will be an integer pointer type.		/// constant type, which for most targets will be an integer pointer type.
///		///
/// This operation supports extracting a fixed-width vector from a scalable		/// This operation supports extracting a fixed-width vector from a scalable
/// vector, but not the other way around.		/// vector, but not the other way around.
EXTRACT_SUBVECTOR,		EXTRACT_SUBVECTOR,

		/// VECTOR_DEINTERLEAVE(VEC1, VEC2) - Returns two vectors with all input and
		reamesUnsubmitted Not Done Reply Inline Actions The choice here to represent the longer vector type as two vectors which are implicitly concatenated is interesting. Can you explain why you made that choice? Is it important for legalization? reames: The choice here to represent the longer vector type as two vectors which are implicitly…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions It is more complicated when we have different sizes of input and output to legalise, keeping all inputs and outputs in the same size makes legalisation simpler. CarolineConcatto: It is more complicated when we have different sizes of input and output to legalise, keeping…
		sdesmalenUnsubmitted Not Done Reply Inline Actions That's right, legalisation becomes simpler when all the types are the same. For example, when the input vector is illegal (too wide) but the vector output is legal, we'd need to split the operation into two. // Assuming that nxv4i32 is legal, and nxv8i32 needs splitting nxv4i32 deinterleave(nxv8i32, 0) -> // Now the input vector is legal, but the output type of deinterleave (nxv2i32) // is illegal and the operation needs further promotion or widening nxv4i32 concat(nxv2i32 deinterleave(nxv4i32 extract_lo(nxv8i32), 0), nxv2i32 deinterleave(nxv4i32 extract_hi(nxv8i32), 0)) Whereas, if we split the vector such that we have `nxv4i32 deinterleave(nxv4i32, nxv4i32, 0)`, then all the types are legal (or illegal, when using a different example) at the same time. sdesmalen: That's right, legalisation becomes simpler when all the types are the same. For example, when…
		/// output vectors having the same type. The first output contains the even
		/// indices from CONCAT_VECTORS(VEC1, VEC2), with the second output
		/// containing the odd indices. The relative order of elements within an
		paulwalker-armUnsubmitted Done Reply Inline Actions It's worth being explicit as to what "from VEC1 and VEC2" means. Perhaps: Returns two vectors with all input and output vectors having the same type. The first output contains the even indices from CONCAT_VECTORS(VEC1, VEC2), with the second output containing the odd indices. The relative order of elements within an output match that of the concatenated input. paulwalker-arm: It's worth being explicit as to what "from VEC1 and VEC2" means. Perhaps: ``` Returns two…
		/// output match that of the concatenated input.
		VECTOR_DEINTERLEAVE,

		/// VECTOR_INTERLEAVE(VEC1, VEC2) - Returns two vectors with all input and
		/// output vectors having the same type. The first output contains the
		/// result of interleaving the low half of CONCAT_VECTORS(VEC1, VEC2), with
		/// the second output containing the result of interleaving the high half.
		paulwalker-armUnsubmitted Done Reply Inline Actions Similar to the above, perhaps: Returns two vectors with all input and output vectors having the same type. The first output contains the result of interleaving the low half of CONCAT_VECTORS(VEC1, VEC2), with the second output containing the result of interleaving the high half. paulwalker-arm: Similar to the above, perhaps: ``` Returns two vectors with all input and output vectors having…
		VECTOR_INTERLEAVE,

/// VECTOR_REVERSE(VECTOR) - Returns a vector, of the same type as VECTOR,		/// VECTOR_REVERSE(VECTOR) - Returns a vector, of the same type as VECTOR,
/// whose elements are shuffled using the following algorithm:		/// whose elements are shuffled using the following algorithm:
/// RESULT[i] = VECTOR[VECTOR.ElementCount - 1 - i]		/// RESULT[i] = VECTOR[VECTOR.ElementCount - 1 - i]
VECTOR_REVERSE,		VECTOR_REVERSE,

/// VECTOR_SHUFFLE(VEC1, VEC2) - Returns a vector, of the same type as		/// VECTOR_SHUFFLE(VEC1, VEC2) - Returns a vector, of the same type as
/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int		/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int
/// values that indicate which value (or undef) each result element will		/// values that indicate which value (or undef) each result element will
▲ Show 20 Lines • Show All 951 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 2,115 Lines • ▼ Show 20 Lines
	def int_vector_insert : DefaultAttrsIntrinsic<[llvm_anyvector_ty],			def int_vector_insert : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
	[LLVMMatchType<0>, llvm_anyvector_ty, llvm_i64_ty],			[LLVMMatchType<0>, llvm_anyvector_ty, llvm_i64_ty],
	[IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>]>;			[IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<2>>]>;

	def int_vector_extract : DefaultAttrsIntrinsic<[llvm_anyvector_ty],			def int_vector_extract : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
	[llvm_anyvector_ty, llvm_i64_ty],			[llvm_anyvector_ty, llvm_i64_ty],
	[IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<1>>]>;			[IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<1>>]>;


				def int_experimental_vector_interleave2 : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
				[LLVMHalfElementsVectorType<0>,
				LLVMHalfElementsVectorType<0>],
				[IntrNoMem]>;

				def int_experimental_vector_deinterleave2 : DefaultAttrsIntrinsic<[LLVMHalfElementsVectorType<0>,
				LLVMHalfElementsVectorType<0>],
				[llvm_anyvector_ty],
				[IntrNoMem]>;
				paulwalker-armUnsubmitted Not Done Reply Inline Actions These intrinsic definitions don't look to match the langref description or tests. The text says: declare <4 x double> @llvm.experimental.vector.interleave2.v2f64(<2 x double> %vec1, <2 x double> %vec2) making `v2f64` the overloaded type from which the others (i.e. the `<4 x double>` return type) are derived. However, `int_experimental_vector_interleave2` is defined such that the return type is the overloaded type. You can see this if you run your tests through `opt`. For example, interleave2_nxv2f16 becomes: define <vscale x 4 x half> @interleave2_nxv2f16(<vscale x 2 x half> %vec0, <vscale x 2 x half> %vec1) #0 { %retval = call <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half> %vec0, <vscale x 2 x half> %vec1) ret <vscale x 4 x half> %retval } For what it's worth I think the text is correct because overloading the smaller type will look more consistent when adding other shapes. Which is to say I have a slight preference for: <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv2f16(<vscale x 2 x half>... <vscale x 6 x half> @llvm.experimental.vector.interleave3.nxv2f16(<vscale x 2 x half>... That will mean adding `LLVMDoubleElementsVectorType` though, unless there's already a class that'll do what we need. paulwalker-arm: These intrinsic definitions don't look to match the langref description or tests. The text…
				CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions I believe we want to keep like this and not create a LLVMDoubleElementsVectorType. AFAIU we should use what we already have available in LLVM. More over if we want more strides in the future we may not even have to use LLVMHalfElementsVectorType. But again, if you have a good argument about the use of LLVMDoubleElementsVectorType I can change. ATM I only update the test and the examples to use the biggest size type as overloaded. CarolineConcatto: I believe we want to keep like this and not create a LLVMDoubleElementsVectorType. AFAIU we…
				paulwalker-armUnsubmitted Not Done Reply Inline Actions You'll need to extend the overloaded type support anyway when adding support for other strides so I don't really by the argument. That said, the type to overload on is subjective so if you prefer to use the bigger type then sure I can live with that. paulwalker-arm: You'll need to extend the overloaded type support anyway when adding support for other strides…

	//===----------------- Pointer Authentication Intrinsics ------------------===//			//===----------------- Pointer Authentication Intrinsics ------------------===//
	//			//

	// Sign an unauthenticated pointer using the specified key and discriminator,			// Sign an unauthenticated pointer using the specified key and discriminator,
	// passed in that order.			// passed in that order.
	// Returns the first argument, with some known bits replaced with a signature.			// Returns the first argument, with some known bits replaced with a signature.
	def int_ptrauth_sign :			def int_ptrauth_sign :
	DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i32_ty, llvm_i64_ty],			DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i32_ty, llvm_i64_ty],
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show First 20 Lines • Show All 643 Lines • ▼ Show 20 Lines	private:

// These two are implemented in StatepointLowering.cpp		// These two are implemented in StatepointLowering.cpp
void visitGCRelocate(const GCRelocateInst &Relocate);		void visitGCRelocate(const GCRelocateInst &Relocate);
void visitGCResult(const GCResultInst &I);		void visitGCResult(const GCResultInst &I);

void visitVectorReduce(const CallInst &I, unsigned Intrinsic);		void visitVectorReduce(const CallInst &I, unsigned Intrinsic);
void visitVectorReverse(const CallInst &I);		void visitVectorReverse(const CallInst &I);
void visitVectorSplice(const CallInst &I);		void visitVectorSplice(const CallInst &I);
		void visitVectorInterleave(const CallInst &I);
		void visitVectorDeinterleave(const CallInst &I);
void visitStepVector(const CallInst &I);		void visitStepVector(const CallInst &I);

void visitUserOp1(const Instruction &I) {		void visitUserOp1(const Instruction &I) {
llvm_unreachable("UserOp1 should not exist at instruction selection time!");		llvm_unreachable("UserOp1 should not exist at instruction selection time!");
}		}
void visitUserOp2(const Instruction &I) {		void visitUserOp2(const Instruction &I) {
llvm_unreachable("UserOp2 should not exist at instruction selection time!");		llvm_unreachable("UserOp2 should not exist at instruction selection time!");
}		}
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 21 Lines
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/BranchProbabilityInfo.h"		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/CodeGen/Analysis.h"		#include "llvm/CodeGen/Analysis.h"
#include "llvm/CodeGen/AssignmentTrackingAnalysis.h"		#include "llvm/CodeGen/AssignmentTrackingAnalysis.h"
#include "llvm/CodeGen/CodeGenCommonISel.h"		#include "llvm/CodeGen/CodeGenCommonISel.h"
#include "llvm/CodeGen/FunctionLoweringInfo.h"		#include "llvm/CodeGen/FunctionLoweringInfo.h"
#include "llvm/CodeGen/GCMetadata.h"		#include "llvm/CodeGen/GCMetadata.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
▲ Show 20 Lines • Show All 7,278 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_vector_reverse:
visitVectorReverse(I);		visitVectorReverse(I);
return;		return;
case Intrinsic::experimental_vector_splice:		case Intrinsic::experimental_vector_splice:
visitVectorSplice(I);		visitVectorSplice(I);
return;		return;
case Intrinsic::callbr_landingpad:		case Intrinsic::callbr_landingpad:
visitCallBrLandingPad(I);		visitCallBrLandingPad(I);
return;		return;
		case Intrinsic::experimental_vector_interleave2:
		visitVectorInterleave(I);
		return;
		case Intrinsic::experimental_vector_deinterleave2:
		visitVectorDeinterleave(I);
		return;
}		}
}		}

void SelectionDAGBuilder::visitConstrainedFPIntrinsic(		void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
const ConstrainedFPIntrinsic &FPI) {		const ConstrainedFPIntrinsic &FPI) {
SDLoc sdl = getCurSDLoc();		SDLoc sdl = getCurSDLoc();

// We do not need to serialize constrained FP intrinsics against		// We do not need to serialize constrained FP intrinsics against
▲ Show 20 Lines • Show All 4,214 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitVectorReverse(const CallInst &I) {
SmallVector<int, 8> Mask;		SmallVector<int, 8> Mask;
unsigned NumElts = VT.getVectorMinNumElements();		unsigned NumElts = VT.getVectorMinNumElements();
for (unsigned i = 0; i != NumElts; ++i)		for (unsigned i = 0; i != NumElts; ++i)
Mask.push_back(NumElts - 1 - i);		Mask.push_back(NumElts - 1 - i);

setValue(&I, DAG.getVectorShuffle(VT, DL, V, DAG.getUNDEF(VT), Mask));		setValue(&I, DAG.getVectorShuffle(VT, DL, V, DAG.getUNDEF(VT), Mask));
}		}

		void SelectionDAGBuilder::visitVectorDeinterleave(const CallInst &I) {
		auto DL = getCurSDLoc();
		SDValue InVec = getValue(I.getOperand(0));
		EVT OutVT =
		InVec.getValueType().getHalfNumVectorElementsVT(*DAG.getContext());

		paulwalker-armUnsubmitted Done Reply Inline Actions Given how the intrinsic is defined, what do you think of using `EVT OutVT = InVec.getValueType().getHalfNumVectorElementsVT();` rather than the more expensive looking call to ComputeValueVTs. paulwalker-arm: Given how the intrinsic is defined, what do you think of using `EVT OutVT = InVec.getValueType…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Thank you, I was not aware of this function. CarolineConcatto: Thank you, I was not aware of this function.
		unsigned OutNumElts = OutVT.getVectorMinNumElements();

		// ISD Node needs the input vectors split into two equal parts
		SDValue Lo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, OutVT, InVec,
		DAG.getVectorIdxConstant(0, DL));
		paulwalker-armUnsubmitted Done Reply Inline Actions It's better to use `getVectorIdxConstant` here. paulwalker-arm: It's better to use `getVectorIdxConstant` here.
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Thank you Paul, I did not know about this function. CarolineConcatto: Thank you Paul, I did not know about this function.
		lukeUnsubmitted Not Done Reply Inline Actions Seconded, I'm working on a patch to lower these intrinsics to RISC-V and it currently throws an assertion here on RV32 as it doesn't support MVT::i64 constants. luke: Seconded, I'm working on a patch to lower these intrinsics to RISC-V and it currently throws an…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions @luke Does it solved your problem if we replace by getVectorIdxConstant like suggest by Paul? CarolineConcatto: @luke Does it solved your problem if we replace by getVectorIdxConstant like suggest by Paul?
		lukeUnsubmitted Not Done Reply Inline Actions Yeah, I tried it out locally and it fixes it! luke: Yeah, I tried it out locally and it fixes it!
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Wonderfull, I will update the patch! CarolineConcatto: Wonderfull, I will update the patch!
		SDValue Hi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, OutVT, InVec,
		DAG.getVectorIdxConstant(OutNumElts, DL));
		reamesUnsubmitted Not Done Reply Inline Actions Is this actually the semantic of a extract_vector on scalable vectors? Given e.g. 2, I'd expect to get a vector starting at the 3rd element, not split at the high half. reames: Is this actually the semantic of a extract_vector on scalable vectors? Given e.g. 2, I'd…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions I don't know if I understand what you are writing about. But just in case this is my explanation: The ISD Node for DEINTERLEAVE, needs the input vector divided by two. Splitting the input vector in half makes all operands have the same size(inputs and outputs). This makes the legalisation work less complicated. For the deinterleave the output is always half the size of the input def int_experimental_vector_deinterleave_even : DefaultAttrsIntrinsic<[LLVMHalfElementsVectorType<0>], [llvm_anyvector_ty], So if we split the input in half they should all have the same size when using ISD Node. As far as I understand idx is a multiple of the known minimum vector length. For instance: <nxv2i64> extract_vector<nxv4i64>, the index can be only 0 or 2 <nxv4i32> extract_vector<nxv8i32>, the index can only be 0 or 4 So I believe the index is correct if we want to split the input vector in half. /// EXTRACT_SUBVECTOR(VECTOR, IDX) - Returns a subvector from VECTOR. /// Let the result type be T, then IDX represents the starting element number /// from which a subvector of type T is extracted. IDX must be a constant /// multiple of T's known minimum vector length. CarolineConcatto: I don't know if I understand what you are writing about. But just in case this is my…
		paulwalker-armUnsubmitted Done Reply Inline Actions As above. paulwalker-arm: As above.

		// Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing
		// legalisation and combines.
		paulwalker-armUnsubmitted Done Reply Inline Actions "Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing legalisation and combines." paulwalker-arm: "Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing legalisation and combines.
		if (OutVT.isFixedLengthVector()) {
		SDValue Even = DAG.getVectorShuffle(OutVT, DL, Lo, Hi,
		createStrideMask(0, 2, OutNumElts));
		paulwalker-armUnsubmitted Done Reply Inline Actions It's better to be specific, so `OutVTs[0].isFixedLengthVector()`. paulwalker-arm: It's better to be specific, so `OutVTs[0].isFixedLengthVector()`.
		SDValue Odd = DAG.getVectorShuffle(OutVT, DL, Lo, Hi,
		reamesUnsubmitted Done Reply Inline Actions This should be createStrideMask in VectorUtils.h right? reames: This should be createStrideMask in VectorUtils.h right?
		createStrideMask(1, 2, OutNumElts));
		SDValue Res = DAG.getMergeValues({Even, Odd}, getCurSDLoc());
		setValue(&I, Res);
		return;
		}

		SDValue Res = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL,
		DAG.getVTList(OutVT, OutVT), Lo, Hi);
		setValue(&I, Res);
		return;
		}

		void SelectionDAGBuilder::visitVectorInterleave(const CallInst &I) {
		auto DL = getCurSDLoc();
		EVT InVT = getValue(I.getOperand(0)).getValueType();
		SDValue InVec0 = getValue(I.getOperand(0));
		SDValue InVec1 = getValue(I.getOperand(1));
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		paulwalker-armUnsubmitted Done Reply Inline Actions Not that bothered but does the array gives us anything useful over the more typical: SDValue InVec1 = getValue(I.getOperand(0)); SDValue InVec2 = getValue(I.getOperand(1)); paulwalker-arm: Not that bothered but does the array gives us anything useful over the more typical: ```…
		EVT OutVT = TLI.getValueType(DAG.getDataLayout(), I.getType());

		// Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing
		// legalisation and combines.
		reamesUnsubmitted Done Reply Inline Actions Comment says deinterleave, fix. reames: Comment says deinterleave, fix.
		if (OutVT.isFixedLengthVector()) {
		unsigned NumElts = InVT.getVectorMinNumElements();
		paulwalker-armUnsubmitted Done Reply Inline Actions Similar typos as mentioned in visitVectorDeinterleave. paulwalker-arm: Similar typos as mentioned in visitVectorDeinterleave.
		SDValue V = DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, InVec0, InVec1);
		reamesUnsubmitted Done Reply Inline Actions createInterleaveMask - though, I think you chose a different strategy using the concat_vector here. Make sure you keep it consistent if you change one part. reames: createInterleaveMask - though, I think you chose a different strategy using the concat_vector…
		reamesUnsubmitted Done Reply Inline Actions ignore me on the second part of the comment here; I'd misread. Reusing createInterleaveMask would still be good. reames: ignore me on the second part of the comment here; I'd misread. Reusing createInterleaveMask…
		setValue(&I, DAG.getVectorShuffle(OutVT, DL, V, DAG.getUNDEF(OutVT),
		createInterleaveMask(NumElts, 2)));
		return;
		}

		SDValue Res = DAG.getNode(ISD::VECTOR_INTERLEAVE, DL,
		DAG.getVTList(InVT, InVT), InVec0, InVec1);
		Res = DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, Res.getValue(0),
		Res.getValue(1));
		setValue(&I, Res);
		return;
		}

void SelectionDAGBuilder::visitFreeze(const FreezeInst &I) {		void SelectionDAGBuilder::visitFreeze(const FreezeInst &I) {
SmallVector<EVT, 4> ValueVTs;		SmallVector<EVT, 4> ValueVTs;
ComputeValueVTs(DAG.getTargetLoweringInfo(), DAG.getDataLayout(), I.getType(),		ComputeValueVTs(DAG.getTargetLoweringInfo(), DAG.getDataLayout(), I.getType(),
ValueVTs);		ValueVTs);
unsigned NumValues = ValueVTs.size();		unsigned NumValues = ValueVTs.size();
if (NumValues == 0) return;		if (NumValues == 0) return;

SmallVector<SDValue, 4> Values(NumValues);		SmallVector<SDValue, 4> Values(NumValues);
▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 291 Lines • ▼ Show 20 Lines	#endif
case ISD::SELECT: return "select";		case ISD::SELECT: return "select";
case ISD::VSELECT: return "vselect";		case ISD::VSELECT: return "vselect";
case ISD::SELECT_CC: return "select_cc";		case ISD::SELECT_CC: return "select_cc";
case ISD::INSERT_VECTOR_ELT: return "insert_vector_elt";		case ISD::INSERT_VECTOR_ELT: return "insert_vector_elt";
case ISD::EXTRACT_VECTOR_ELT: return "extract_vector_elt";		case ISD::EXTRACT_VECTOR_ELT: return "extract_vector_elt";
case ISD::CONCAT_VECTORS: return "concat_vectors";		case ISD::CONCAT_VECTORS: return "concat_vectors";
case ISD::INSERT_SUBVECTOR: return "insert_subvector";		case ISD::INSERT_SUBVECTOR: return "insert_subvector";
case ISD::EXTRACT_SUBVECTOR: return "extract_subvector";		case ISD::EXTRACT_SUBVECTOR: return "extract_subvector";
		case ISD::VECTOR_DEINTERLEAVE: return "vector_deinterleave";
		case ISD::VECTOR_INTERLEAVE: return "vector_interleave";
case ISD::SCALAR_TO_VECTOR: return "scalar_to_vector";		case ISD::SCALAR_TO_VECTOR: return "scalar_to_vector";
case ISD::VECTOR_SHUFFLE: return "vector_shuffle";		case ISD::VECTOR_SHUFFLE: return "vector_shuffle";
case ISD::VECTOR_SPLICE: return "vector_splice";		case ISD::VECTOR_SPLICE: return "vector_splice";
case ISD::SPLAT_VECTOR: return "splat_vector";		case ISD::SPLAT_VECTOR: return "splat_vector";
case ISD::SPLAT_VECTOR_PARTS: return "splat_vector_parts";		case ISD::SPLAT_VECTOR_PARTS: return "splat_vector_parts";
case ISD::VECTOR_REVERSE: return "vector_reverse";		case ISD::VECTOR_REVERSE: return "vector_reverse";
case ISD::STEP_VECTOR: return "step_vector";		case ISD::STEP_VECTOR: return "step_vector";
case ISD::CARRY_FALSE: return "carry_false";		case ISD::CARRY_FALSE: return "carry_false";
▲ Show 20 Lines • Show All 771 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 1,049 Lines • ▼ Show 20 Lines	private:
SDValue LowerSPLAT_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSPLAT_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDUPQLane(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDUPQLane(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerToPredicatedOp(SDValue Op, SelectionDAG &DAG,		SDValue LowerToPredicatedOp(SDValue Op, SelectionDAG &DAG,
unsigned NewOp) const;		unsigned NewOp) const;
SDValue LowerToScalableOp(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerToScalableOp(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECTOR_SPLICE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVECTOR_SPLICE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINSERT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINSERT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerVECTOR_DEINTERLEAVE(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerVECTOR_INTERLEAVE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDIV(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDIV(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerMUL(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerMUL(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShiftParts(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShiftParts(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVSETCC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVSETCC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCTPOP_PARITY(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCTPOP_PARITY(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCTTZ(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCTTZ(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBitreverse(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBitreverse(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,222 Lines • ▼ Show 20 Lines	#undef LCALLNAME5

// FIXME: Move lowering for more nodes here if those are common between		// FIXME: Move lowering for more nodes here if those are common between
// SVE and SME.		// SVE and SME.
if (Subtarget->hasSVEorSME()) {		if (Subtarget->hasSVEorSME()) {
for (auto VT :		for (auto VT :
{MVT::nxv16i1, MVT::nxv8i1, MVT::nxv4i1, MVT::nxv2i1, MVT::nxv1i1}) {		{MVT::nxv16i1, MVT::nxv8i1, MVT::nxv4i1, MVT::nxv2i1, MVT::nxv1i1}) {
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);
		setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
		setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);
		paulwalker-armUnsubmitted Not Done Reply Inline Actions No change required, the following is just an observational comment. I just wanted to highlight there are no tests for the `MVT::nxv1i1` cases, which I'm guessing triggers an isel failure as there's as yet no support for the zip and uzp nodes for this type? Given legalisation for these nodes will follow this patch I'm assuming there isn't a route that is crash free today. paulwalker-arm: No change required, the following is just an observational comment. I just wanted to highlight…
}		}
}		}

if (Subtarget->hasSVE()) {		if (Subtarget->hasSVE()) {
for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {		for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {
setOperationAction(ISD::BITREVERSE, VT, Custom);		setOperationAction(ISD::BITREVERSE, VT, Custom);
setOperationAction(ISD::BSWAP, VT, Custom);		setOperationAction(ISD::BSWAP, VT, Custom);
setOperationAction(ISD::CTLZ, VT, Custom);		setOperationAction(ISD::CTLZ, VT, Custom);
Show All 29 Lines	for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {
setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);
setOperationAction(ISD::VECREDUCE_AND, VT, Custom);		setOperationAction(ISD::VECREDUCE_AND, VT, Custom);
setOperationAction(ISD::VECREDUCE_OR, VT, Custom);		setOperationAction(ISD::VECREDUCE_OR, VT, Custom);
setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);		setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);
setOperationAction(ISD::VECREDUCE_UMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_UMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);
		setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
		setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);

setOperationAction(ISD::UMUL_LOHI, VT, Expand);		setOperationAction(ISD::UMUL_LOHI, VT, Expand);
setOperationAction(ISD::SMUL_LOHI, VT, Expand);		setOperationAction(ISD::SMUL_LOHI, VT, Expand);
setOperationAction(ISD::SELECT_CC, VT, Expand);		setOperationAction(ISD::SELECT_CC, VT, Expand);
setOperationAction(ISD::ROTL, VT, Expand);		setOperationAction(ISD::ROTL, VT, Expand);
setOperationAction(ISD::ROTR, VT, Expand);		setOperationAction(ISD::ROTR, VT, Expand);

setOperationAction(ISD::SADDSAT, VT, Legal);		setOperationAction(ISD::SADDSAT, VT, Legal);
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	for (auto VT : {MVT::nxv2f16, MVT::nxv4f16, MVT::nxv8f16, MVT::nxv2f32,
setOperationAction(ISD::FABS, VT, Custom);		setOperationAction(ISD::FABS, VT, Custom);
setOperationAction(ISD::FP_EXTEND, VT, Custom);		setOperationAction(ISD::FP_EXTEND, VT, Custom);
setOperationAction(ISD::FP_ROUND, VT, Custom);		setOperationAction(ISD::FP_ROUND, VT, Custom);
setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);
setOperationAction(ISD::VECREDUCE_FMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_FMAX, VT, Custom);
setOperationAction(ISD::VECREDUCE_FMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_FMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);
setOperationAction(ISD::VECTOR_SPLICE, VT, Custom);		setOperationAction(ISD::VECTOR_SPLICE, VT, Custom);
		setOperationAction(ISD::VECTOR_DEINTERLEAVE, VT, Custom);
		setOperationAction(ISD::VECTOR_INTERLEAVE, VT, Custom);

setOperationAction(ISD::SELECT_CC, VT, Expand);		setOperationAction(ISD::SELECT_CC, VT, Expand);
setOperationAction(ISD::FREM, VT, Expand);		setOperationAction(ISD::FREM, VT, Expand);
setOperationAction(ISD::FPOW, VT, Expand);		setOperationAction(ISD::FPOW, VT, Expand);
setOperationAction(ISD::FPOWI, VT, Expand);		setOperationAction(ISD::FPOWI, VT, Expand);
setOperationAction(ISD::FCOS, VT, Expand);		setOperationAction(ISD::FCOS, VT, Expand);
setOperationAction(ISD::FSIN, VT, Expand);		setOperationAction(ISD::FSIN, VT, Expand);
setOperationAction(ISD::FSINCOS, VT, Expand);		setOperationAction(ISD::FSINCOS, VT, Expand);
▲ Show 20 Lines • Show All 4,676 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
case ISD::BSWAP:		case ISD::BSWAP:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::BSWAP_MERGE_PASSTHRU);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::BSWAP_MERGE_PASSTHRU);
case ISD::CTLZ:		case ISD::CTLZ:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU);
case ISD::CTTZ:		case ISD::CTTZ:
return LowerCTTZ(Op, DAG);		return LowerCTTZ(Op, DAG);
case ISD::VECTOR_SPLICE:		case ISD::VECTOR_SPLICE:
return LowerVECTOR_SPLICE(Op, DAG);		return LowerVECTOR_SPLICE(Op, DAG);
		case ISD::VECTOR_DEINTERLEAVE:
		return LowerVECTOR_DEINTERLEAVE(Op, DAG);
		case ISD::VECTOR_INTERLEAVE:
		return LowerVECTOR_INTERLEAVE(Op, DAG);
case ISD::STRICT_LROUND:		case ISD::STRICT_LROUND:
case ISD::STRICT_LLROUND:		case ISD::STRICT_LLROUND:
case ISD::STRICT_LRINT:		case ISD::STRICT_LRINT:
case ISD::STRICT_LLRINT: {		case ISD::STRICT_LLRINT: {
assert(Op.getOperand(1).getValueType() == MVT::f16 &&		assert(Op.getOperand(1).getValueType() == MVT::f16 &&
"Expected custom lowering of rounding operations only for f16");		"Expected custom lowering of rounding operations only for f16");
SDLoc DL(Op);		SDLoc DL(Op);
SDValue Ext = DAG.getNode(ISD::STRICT_FP_EXTEND, DL, {MVT::f32, MVT::Other},		SDValue Ext = DAG.getNode(ISD::STRICT_FP_EXTEND, DL, {MVT::f32, MVT::Other},
▲ Show 20 Lines • Show All 17,930 Lines • ▼ Show 20 Lines	if (VT.bitsGE(SrcVT)) {
Val = convertFromScalableVector(DAG, SrcVT, Val);		Val = convertFromScalableVector(DAG, SrcVT, Val);

Val = DAG.getNode(ISD::TRUNCATE, DL, VT.changeTypeToInteger(), Val);		Val = DAG.getNode(ISD::TRUNCATE, DL, VT.changeTypeToInteger(), Val);
return DAG.getNode(ISD::BITCAST, DL, VT, Val);		return DAG.getNode(ISD::BITCAST, DL, VT, Val);
}		}
}		}

SDValue		SDValue
		AArch64TargetLowering::LowerVECTOR_DEINTERLEAVE(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc DL(Op);
		EVT OpVT = Op.getValueType();
		assert(OpVT.isScalableVector() &&
		"Expected scalable vector in LowerVECTOR_DEINTERLEAVE.");
		paulwalker-armUnsubmitted Done Reply Inline Actions I think `Expected scalable vector` better reflects the assert. paulwalker-arm: I think `Expected scalable vector` better reflects the assert.
		SDValue Even = DAG.getNode(AArch64ISD::UZP1, DL, OpVT, Op.getOperand(0),
		Op.getOperand(1));
		SDValue Odd = DAG.getNode(AArch64ISD::UZP2, DL, OpVT, Op.getOperand(0),
		Op.getOperand(1));
		return DAG.getMergeValues({Even, Odd}, DL);
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I think `Op->getVTList()` should work here? paulwalker-arm: I think `Op->getVTList()` should work here?
		lukeUnsubmitted Done Reply Inline Actions Could you use `DAG.getMergeValues` here too? luke: Could you use `DAG.getMergeValues` here too?
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Thank you Luke, also missed that function when looking at MERGE. CarolineConcatto: Thank you Luke, also missed that function when looking at MERGE.
		}

		SDValue AArch64TargetLowering::LowerVECTOR_INTERLEAVE(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc DL(Op);
		EVT OpVT = Op.getValueType();
		assert(OpVT.isScalableVector() &&
		"Expected scalable vector in LowerVECTOR_INTERLEAVE.");

		paulwalker-armUnsubmitted Done Reply Inline Actions Same comment as with LowerVECTOR_DEINTERLEAVE. paulwalker-arm: Same comment as with LowerVECTOR_DEINTERLEAVE.
		SDValue Lo = DAG.getNode(AArch64ISD::ZIP1, DL, OpVT, Op.getOperand(0),
		Op.getOperand(1));
		SDValue Hi = DAG.getNode(AArch64ISD::ZIP2, DL, OpVT, Op.getOperand(0),
		Op.getOperand(1));
		return DAG.getMergeValues({Lo, Hi}, DL);
		}
		paulwalker-armUnsubmitted Done Reply Inline Actions Same comment as with LowerVECTOR_DEINTERLEAVE. paulwalker-arm: Same comment as with LowerVECTOR_DEINTERLEAVE.

		SDValue
AArch64TargetLowering::LowerFixedLengthFPToIntToSVE(SDValue Op,		AArch64TargetLowering::LowerFixedLengthFPToIntToSVE(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
assert(VT.isFixedLengthVector() && "Expected fixed length vector type!");		assert(VT.isFixedLengthVector() && "Expected fixed length vector type!");

bool IsSigned = Op.getOpcode() == ISD::FP_TO_SINT;		bool IsSigned = Op.getOpcode() == ISD::FP_TO_SINT;
unsigned Opcode = IsSigned ? AArch64ISD::FCVTZS_MERGE_PASSTHRU		unsigned Opcode = IsSigned ? AArch64ISD::FCVTZS_MERGE_PASSTHRU
: AArch64ISD::FCVTZU_MERGE_PASSTHRU;		: AArch64ISD::FCVTZU_MERGE_PASSTHRU;
▲ Show 20 Lines • Show All 398 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/fixed-vector-deinterleave.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=aarch64-linux-gnu \| FileCheck %s

				define {<2 x half>, <2 x half>} @vector_deinterleave_v2f16_v4f16(<4 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_v2f16_v4f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-NEXT: dup v1.2s, v0.s[1]
				; CHECK-NEXT: mov v2.16b, v0.16b
				; CHECK-NEXT: mov v2.h[1], v1.h[0]
				; CHECK-NEXT: mov v1.h[0], v0.h[1]
				; CHECK-NEXT: // kill: def $d1 killed $d1 killed $q1
				; CHECK-NEXT: fmov d0, d2
				; CHECK-NEXT: ret
				%retval = call {<2 x half>, <2 x half>} @llvm.experimental.vector.deinterleave2.v4f16(<4 x half> %vec)
				ret {<2 x half>, <2 x half>} %retval
				}

				define {<4 x half>, <4 x half>} @vector_deinterleave_v4f16_v8f16(<8 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_v4f16_v8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ext v1.16b, v0.16b, v0.16b, #8
				; CHECK-NEXT: uzp1 v2.4h, v0.4h, v1.4h
				; CHECK-NEXT: uzp2 v1.4h, v0.4h, v1.4h
				; CHECK-NEXT: fmov d0, d2
				; CHECK-NEXT: ret
				%retval = call {<4 x half>, <4 x half>} @llvm.experimental.vector.deinterleave2.v8f16(<8 x half> %vec)
				ret {<4 x half>, <4 x half>} %retval
				}

				define {<8 x half>, <8 x half>} @vector_deinterleave_v8f16_v16f16(<16 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_v8f16_v16f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 v2.8h, v0.8h, v1.8h
				; CHECK-NEXT: uzp2 v1.8h, v0.8h, v1.8h
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call {<8 x half>, <8 x half>} @llvm.experimental.vector.deinterleave2.v16f16(<16 x half> %vec)
				ret {<8 x half>, <8 x half>} %retval
				}

				define {<2 x float>, <2 x float>} @vector_deinterleave_v2f32_v4f32(<4 x float> %vec) {
				; CHECK-LABEL: vector_deinterleave_v2f32_v4f32:
				; CHECK: // %bb.0:
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Can you move this function before `vector_deinterleave_v2f32_v4f32` to keep the related element types together. paulwalker-arm: Can you move this function before `vector_deinterleave_v2f32_v4f32` to keep the related element…
				; CHECK-NEXT: ext v1.16b, v0.16b, v0.16b, #8
				; CHECK-NEXT: zip1 v2.2s, v0.2s, v1.2s
				; CHECK-NEXT: zip2 v1.2s, v0.2s, v1.2s
				; CHECK-NEXT: fmov d0, d2
				; CHECK-NEXT: ret
				%retval = call {<2 x float>, <2 x float>} @llvm.experimental.vector.deinterleave2.v4f32(<4 x float> %vec)
				ret {<2 x float>, <2 x float>} %retval
				}

				define {<4 x float>, <4 x float>} @vector_deinterleave_v4f32_v8f32(<8 x float> %vec) {
				; CHECK-LABEL: vector_deinterleave_v4f32_v8f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 v2.4s, v0.4s, v1.4s
				; CHECK-NEXT: uzp2 v1.4s, v0.4s, v1.4s
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call {<4 x float>, <4 x float>} @llvm.experimental.vector.deinterleave2.v8f32(<8 x float> %vec)
				ret {<4 x float>, <4 x float>} %retval
				}

				define {<2 x double>, <2 x double>} @vector_deinterleave_v2f64_v4f64(<4 x double> %vec) {
				; CHECK-LABEL: vector_deinterleave_v2f64_v4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v2.2d, v0.2d, v1.2d
				; CHECK-NEXT: zip2 v1.2d, v0.2d, v1.2d
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call {<2 x double>, <2 x double>} @llvm.experimental.vector.deinterleave2.v4f64(<4 x double> %vec)
				ret {<2 x double>, <2 x double>} %retval
				}

				; Integers

				define {<16 x i8>, <16 x i8>} @vector_deinterleave_v16i8_v32i8(<32 x i8> %vec) {
				; CHECK-LABEL: vector_deinterleave_v16i8_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 v2.16b, v0.16b, v1.16b
				; CHECK-NEXT: uzp2 v1.16b, v0.16b, v1.16b
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call {<16 x i8>, <16 x i8>} @llvm.experimental.vector.deinterleave2.v32i8(<32 x i8> %vec)
				ret {<16 x i8>, <16 x i8>} %retval
				}

				define {<8 x i16>, <8 x i16>} @vector_deinterleave_v8i16_v16i16(<16 x i16> %vec) {
				; CHECK-LABEL: vector_deinterleave_v8i16_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 v2.8h, v0.8h, v1.8h
				; CHECK-NEXT: uzp2 v1.8h, v0.8h, v1.8h
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call {<8 x i16>, <8 x i16>} @llvm.experimental.vector.deinterleave2.v16i16(<16 x i16> %vec)
				ret {<8 x i16>, <8 x i16>} %retval
				}

				define {<4 x i32>, <4 x i32>} @vector_deinterleave_v4i32_v8i32(<8 x i32> %vec) {
				; CHECK-LABEL: vector_deinterleave_v4i32_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 v2.4s, v0.4s, v1.4s
				; CHECK-NEXT: uzp2 v1.4s, v0.4s, v1.4s
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call {<4 x i32>, <4 x i32>} @llvm.experimental.vector.deinterleave2.v8i32(<8 x i32> %vec)
				ret {<4 x i32>, <4 x i32>} %retval
				}

				define {<2 x i64>, <2 x i64>} @vector_deinterleave_v2i64_v4i64(<4 x i64> %vec) {
				; CHECK-LABEL: vector_deinterleave_v2i64_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v2.2d, v0.2d, v1.2d
				; CHECK-NEXT: zip2 v1.2d, v0.2d, v1.2d
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call {<2 x i64>, <2 x i64>} @llvm.experimental.vector.deinterleave2.v4i64(<4 x i64> %vec)
				ret {<2 x i64>, <2 x i64>} %retval
				}


				; Floating declarations
				declare {<2 x half>,<2 x half>} @llvm.experimental.vector.deinterleave2.v4f16(<4 x half>)
				declare {<4 x half>, <4 x half>} @llvm.experimental.vector.deinterleave2.v8f16(<8 x half>)
				declare {<2 x float>, <2 x float>} @llvm.experimental.vector.deinterleave2.v4f32(<4 x float>)
				paulwalker-armUnsubmitted Done Reply Inline Actions I don't think this test offers any value. It's really showing how `VECTOR_SHUFFLE` is legalised, which this patch doesn't care about. paulwalker-arm: I don't think this test offers any value. It's really showing how `VECTOR_SHUFFLE` is…
				declare {<8 x half>, <8 x half>} @llvm.experimental.vector.deinterleave2.v16f16(<16 x half>)
				declare {<4 x float>, <4 x float>} @llvm.experimental.vector.deinterleave2.v8f32(<8 x float>)
				declare {<2 x double>, <2 x double>} @llvm.experimental.vector.deinterleave2.v4f64(<4 x double>)

				; Integer declarations
				declare {<16 x i8>, <16 x i8>} @llvm.experimental.vector.deinterleave2.v32i8(<32 x i8>)
				declare {<8 x i16>, <8 x i16>} @llvm.experimental.vector.deinterleave2.v16i16(<16 x i16>)
				declare {<4 x i32>, <4 x i32>} @llvm.experimental.vector.deinterleave2.v8i32(<8 x i32>)
				declare {<2 x i64>, <2 x i64>} @llvm.experimental.vector.deinterleave2.v4i64(<4 x i64>)

llvm/test/CodeGen/AArch64/fixed-vector-interleave.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=aarch64-linux-gnu \| FileCheck %s

				define <4 x half> @interleave2_v4f16(<2 x half> %vec0, <2 x half> %vec1) {
				; CHECK-LABEL: interleave2_v4f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v0.4h, v0.4h, v1.4h
				; CHECK-NEXT: ret
				%retval = call <4 x half> @llvm.experimental.vector.interleave2.v4f16(<2 x half> %vec0, <2 x half> %vec1)
				ret <4 x half> %retval
				}

				define <8 x half> @interleave2_v8f16(<4 x half> %vec0, <4 x half> %vec1) {
				; CHECK-LABEL: interleave2_v8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI1_0
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: mov v0.d[1], v1.d[0]
				; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI1_0]
				; CHECK-NEXT: tbl v0.16b, { v0.16b }, v1.16b
				; CHECK-NEXT: ret
				%retval = call <8 x half> @llvm.experimental.vector.interleave2.v8f16(<4 x half> %vec0, <4 x half> %vec1)
				ret <8 x half> %retval
				}

				define <16 x half> @interleave2_v16f16(<8 x half> %vec0, <8 x half> %vec1) {
				; CHECK-LABEL: interleave2_v16f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v2.8h, v0.8h, v1.8h
				; CHECK-NEXT: zip2 v1.8h, v0.8h, v1.8h
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call <16 x half> @llvm.experimental.vector.interleave2.v16f16(<8 x half> %vec0, <8 x half> %vec1)
				ret <16 x half> %retval
				}

				define <4 x float> @interleave2_v4f32(<2 x float> %vec0, <2 x float> %vec1) {
				; CHECK-LABEL: interleave2_v4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: mov v0.d[1], v1.d[0]
				; CHECK-NEXT: rev64 v1.4s, v0.4s
				; CHECK-NEXT: uzp1 v0.4s, v0.4s, v1.4s
				; CHECK-NEXT: ret
				%retval = call <4 x float> @llvm.experimental.vector.interleave2.v4f32(<2 x float> %vec0, <2 x float> %vec1)
				ret <4 x float> %retval
				}

				define <8 x float> @interleave2_v8f32(<4 x float> %vec0, <4 x float> %vec1) {
				; CHECK-LABEL: interleave2_v8f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v2.4s, v0.4s, v1.4s
				; CHECK-NEXT: zip2 v1.4s, v0.4s, v1.4s
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call <8 x float> @llvm.experimental.vector.interleave2.v8f32(<4 x float> %vec0, <4 x float> %vec1)
				ret <8 x float> %retval
				}

				define <4 x double> @interleave2_v4f64(<2 x double> %vec0, <2 x double> %vec1) {
				; CHECK-LABEL: interleave2_v4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v2.2d, v0.2d, v1.2d
				; CHECK-NEXT: zip2 v1.2d, v0.2d, v1.2d
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call <4 x double>@llvm.experimental.vector.interleave2.v4f64(<2 x double> %vec0, <2 x double> %vec1)
				ret <4 x double> %retval
				}

				; Integers

				define <32 x i8> @interleave2_v32i8(<16 x i8> %vec0, <16 x i8> %vec1) {
				; CHECK-LABEL: interleave2_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v2.16b, v0.16b, v1.16b
				; CHECK-NEXT: zip2 v1.16b, v0.16b, v1.16b
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call <32 x i8> @llvm.experimental.vector.interleave2.v32i8(<16 x i8> %vec0, <16 x i8> %vec1)
				ret <32 x i8> %retval
				}

				define <16 x i16> @interleave2_v16i16(<8 x i16> %vec0, <8 x i16> %vec1) {
				; CHECK-LABEL: interleave2_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v2.8h, v0.8h, v1.8h
				; CHECK-NEXT: zip2 v1.8h, v0.8h, v1.8h
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call <16 x i16> @llvm.experimental.vector.interleave2.v16i16(<8 x i16> %vec0, <8 x i16> %vec1)
				ret <16 x i16> %retval
				}

				define <8 x i32> @interleave2_v8i32(<4 x i32> %vec0, <4 x i32> %vec1) {
				; CHECK-LABEL: interleave2_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v2.4s, v0.4s, v1.4s
				; CHECK-NEXT: zip2 v1.4s, v0.4s, v1.4s
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call <8 x i32> @llvm.experimental.vector.interleave2.v8i32(<4 x i32> %vec0, <4 x i32> %vec1)
				ret <8 x i32> %retval
				}

				define <4 x i64> @interleave2_v4i64(<2 x i64> %vec0, <2 x i64> %vec1) {
				; CHECK-LABEL: interleave2_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v2.2d, v0.2d, v1.2d
				; CHECK-NEXT: zip2 v1.2d, v0.2d, v1.2d
				; CHECK-NEXT: mov v0.16b, v2.16b
				; CHECK-NEXT: ret
				%retval = call <4 x i64> @llvm.experimental.vector.interleave2.v4i64(<2 x i64> %vec0, <2 x i64> %vec1)
				ret <4 x i64> %retval
				}


				; Float declarations
				declare <4 x half> @llvm.experimental.vector.interleave2.v4f16(<2 x half>, <2 x half>)
				declare <8 x half> @llvm.experimental.vector.interleave2.v8f16(<4 x half>, <4 x half>)
				paulwalker-armUnsubmitted Done Reply Inline Actions As with the deinterleave case I don't think this is testing anything the patch really cares about and my worry it'll just cause unnecessary pain for unrelated patches. If we plan to significantly improve the code generation then fine but if not then perhaps they're best removed? paulwalker-arm: As with the deinterleave case I don't think this is testing anything the patch really cares…
				declare <16 x half> @llvm.experimental.vector.interleave2.v16f16(<8 x half>, <8 x half>)
				declare <4 x float> @llvm.experimental.vector.interleave2.v4f32(<2 x float>, <2 x float>)
				declare <8 x float> @llvm.experimental.vector.interleave2.v8f32(<4 x float>, <4 x float>)
				declare <4 x double> @llvm.experimental.vector.interleave2.v4f64(<2 x double>, <2 x double>)

				; Integer declarations
				declare <32 x i8> @llvm.experimental.vector.interleave2.v32i8(<16 x i8>, <16 x i8>)
				declare <16 x i16> @llvm.experimental.vector.interleave2.v16i16(<8 x i16>, <8 x i16>)
				declare <8 x i32> @llvm.experimental.vector.interleave2.v8i32(<4 x i32>, <4 x i32>)
				declare <4 x i64> @llvm.experimental.vector.interleave2.v4i64(<2 x i64>, <2 x i64>)

llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+sve2 \| FileCheck %s

				define {<vscale x 2 x half>, <vscale x 2 x half>} @vector_deinterleave_nxv2f16_nxv4f16(<vscale x 4 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2f16_nxv4f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpkhi z1.d, z0.s
				; CHECK-NEXT: uunpklo z2.d, z0.s
				; CHECK-NEXT: uzp1 z0.d, z2.d, z1.d
				; CHECK-NEXT: uzp2 z1.d, z2.d, z1.d
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x half>, <vscale x 2 x half>} @llvm.experimental.vector.deinterleave2.nxv4f16(<vscale x 4 x half> %vec)
				ret {<vscale x 2 x half>, <vscale x 2 x half>} %retval
				paulwalker-armUnsubmitted Done Reply Inline Actions Please can you indent the IR here and for the other functions in this file.style paulwalker-arm: Please can you indent the IR here and for the other functions in this file.style
				}

				define {<vscale x 4 x half>, <vscale x 4 x half>} @vector_deinterleave_nxv4f16_nxv8f16(<vscale x 8 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv4f16_nxv8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpkhi z1.s, z0.h
				; CHECK-NEXT: uunpklo z2.s, z0.h
				; CHECK-NEXT: uzp1 z0.s, z2.s, z1.s
				; CHECK-NEXT: uzp2 z1.s, z2.s, z1.s
				; CHECK-NEXT: ret
				%retval = call {<vscale x 4 x half>, <vscale x 4 x half>} @llvm.experimental.vector.deinterleave2.nxv8f16(<vscale x 8 x half> %vec)
				ret {<vscale x 4 x half>, <vscale x 4 x half>} %retval
				}

				define {<vscale x 8 x half>, <vscale x 8 x half>} @vector_deinterleave_nxv8f16_nxv16f16(<vscale x 16 x half> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv8f16_nxv16f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z2.h, z0.h, z1.h
				; CHECK-NEXT: uzp2 z1.h, z0.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call {<vscale x 8 x half>, <vscale x 8 x half>} @llvm.experimental.vector.deinterleave2.nxv16f16(<vscale x 16 x half> %vec)
				ret {<vscale x 8 x half>, <vscale x 8 x half>} %retval
				}

				define {<vscale x 2 x float>, <vscale x 2 x float>} @vector_deinterleave_nxv2f32_nxv4f32(<vscale x 4 x float> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2f32_nxv4f32:
				; CHECK: // %bb.0:
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Can you move this function before `vector_deinterleave_nxv2f32_nxv4f32` to keep the related element types together. paulwalker-arm: Can you move this function before `vector_deinterleave_nxv2f32_nxv4f32` to keep the related…
				; CHECK-NEXT: uunpkhi z1.d, z0.s
				; CHECK-NEXT: uunpklo z2.d, z0.s
				; CHECK-NEXT: uzp1 z0.d, z2.d, z1.d
				; CHECK-NEXT: uzp2 z1.d, z2.d, z1.d
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x float>, <vscale x 2 x float>} @llvm.experimental.vector.deinterleave2.nxv4f32(<vscale x 4 x float> %vec)
				ret {<vscale x 2 x float>, <vscale x 2 x float>} %retval
				}

				define {<vscale x 4 x float>, <vscale x 4 x float>} @vector_deinterleave_nxv4f32_nxv8f32(<vscale x 8 x float> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv4f32_nxv8f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z2.s, z0.s, z1.s
				; CHECK-NEXT: uzp2 z1.s, z0.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call {<vscale x 4 x float>, <vscale x 4 x float>} @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float> %vec)
				ret {<vscale x 4 x float>, <vscale x 4 x float>} %retval
				}

				define {<vscale x 2 x double>, <vscale x 2 x double>} @vector_deinterleave_nxv2f64_nxv4f64(<vscale x 4 x double> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2f64_nxv4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z2.d, z0.d, z1.d
				; CHECK-NEXT: uzp2 z1.d, z0.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x double>, <vscale x 2 x double>} @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double> %vec)
				ret {<vscale x 2 x double>, <vscale x 2 x double>} %retval
				}

				; Integers

				define {<vscale x 16 x i8>, <vscale x 16 x i8>} @vector_deinterleave_nxv16i8_nxv32i8(<vscale x 32 x i8> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv16i8_nxv32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z2.b, z0.b, z1.b
				; CHECK-NEXT: uzp2 z1.b, z0.b, z1.b
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call {<vscale x 16 x i8>, <vscale x 16 x i8>} @llvm.experimental.vector.deinterleave2.nxv32i8(<vscale x 32 x i8> %vec)
				ret {<vscale x 16 x i8>, <vscale x 16 x i8>} %retval
				}

				define {<vscale x 8 x i16>, <vscale x 8 x i16>} @vector_deinterleave_nxv8i16_nxv16i16(<vscale x 16 x i16> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv8i16_nxv16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z2.h, z0.h, z1.h
				; CHECK-NEXT: uzp2 z1.h, z0.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call {<vscale x 8 x i16>, <vscale x 8 x i16>} @llvm.experimental.vector.deinterleave2.nxv16i16(<vscale x 16 x i16> %vec)
				ret {<vscale x 8 x i16>, <vscale x 8 x i16>} %retval
				}

				define {<vscale x 4 x i32>, <vscale x 4 x i32>} @vector_deinterleave_nxv4i32_nxvv8i32(<vscale x 8 x i32> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv4i32_nxvv8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z2.s, z0.s, z1.s
				; CHECK-NEXT: uzp2 z1.s, z0.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call {<vscale x 4 x i32>, <vscale x 4 x i32>} @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> %vec)
				ret {<vscale x 4 x i32>, <vscale x 4 x i32>} %retval
				}

				define {<vscale x 2 x i64>, <vscale x 2 x i64>} @vector_deinterleave_nxv2i64_nxv4i64(<vscale x 4 x i64> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2i64_nxv4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z2.d, z0.d, z1.d
				; CHECK-NEXT: uzp2 z1.d, z0.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.experimental.vector.deinterleave2.nxv4i64(<vscale x 4 x i64> %vec)
				ret {<vscale x 2 x i64>, <vscale x 2 x i64>} %retval
				}

				; Predicated
				define {<vscale x 16 x i1>, <vscale x 16 x i1>} @vector_deinterleave_nxv16i1_nxv32i1(<vscale x 32 x i1> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv16i1_nxv32i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 p2.b, p0.b, p1.b
				; CHECK-NEXT: uzp2 p1.b, p0.b, p1.b
				; CHECK-NEXT: mov p0.b, p2.b
				; CHECK-NEXT: ret
				%retval = call {<vscale x 16 x i1>, <vscale x 16 x i1>} @llvm.experimental.vector.deinterleave2.nxv32i1(<vscale x 32 x i1> %vec)
				ret {<vscale x 16 x i1>, <vscale x 16 x i1>} %retval
				}

				define {<vscale x 8 x i1>, <vscale x 8 x i1>} @vector_deinterleave_nxv8i1_nxv16i1(<vscale x 16 x i1> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv8i1_nxv16i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: punpkhi p1.h, p0.b
				; CHECK-NEXT: punpklo p2.h, p0.b
				; CHECK-NEXT: uzp1 p0.h, p2.h, p1.h
				; CHECK-NEXT: uzp2 p1.h, p2.h, p1.h
				; CHECK-NEXT: ret
				%retval = call {<vscale x 8 x i1>, <vscale x 8 x i1>} @llvm.experimental.vector.deinterleave2.nxv16i1(<vscale x 16 x i1> %vec)
				ret {<vscale x 8 x i1>, <vscale x 8 x i1>} %retval
				}

				define {<vscale x 4 x i1>, <vscale x 4 x i1>} @vector_deinterleave_nxv4i1_nxv8i1(<vscale x 8 x i1> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv4i1_nxv8i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: punpkhi p1.h, p0.b
				; CHECK-NEXT: punpklo p2.h, p0.b
				; CHECK-NEXT: uzp1 p0.s, p2.s, p1.s
				; CHECK-NEXT: uzp2 p1.s, p2.s, p1.s
				; CHECK-NEXT: ret
				%retval = call {<vscale x 4 x i1>, <vscale x 4 x i1>} @llvm.experimental.vector.deinterleave2.nxv8i1(<vscale x 8 x i1> %vec)
				ret {<vscale x 4 x i1>, <vscale x 4 x i1>} %retval
				}

				define {<vscale x 2 x i1>, <vscale x 2 x i1>} @vector_deinterleave_nxv2i1_nxv4i1(<vscale x 4 x i1> %vec) {
				; CHECK-LABEL: vector_deinterleave_nxv2i1_nxv4i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: punpkhi p1.h, p0.b
				; CHECK-NEXT: punpklo p2.h, p0.b
				; CHECK-NEXT: uzp1 p0.d, p2.d, p1.d
				; CHECK-NEXT: uzp2 p1.d, p2.d, p1.d
				; CHECK-NEXT: ret
				%retval = call {<vscale x 2 x i1>, <vscale x 2 x i1>} @llvm.experimental.vector.deinterleave2.nxv4i1(<vscale x 4 x i1> %vec)
				ret {<vscale x 2 x i1>, <vscale x 2 x i1>} %retval
				}


				; Floating declarations
				declare {<vscale x 2 x half>,<vscale x 2 x half>} @llvm.experimental.vector.deinterleave2.nxv4f16(<vscale x 4 x half>)
				declare {<vscale x 4 x half>, <vscale x 4 x half>} @llvm.experimental.vector.deinterleave2.nxv8f16(<vscale x 8 x half>)
				declare {<vscale x 2 x float>, <vscale x 2 x float>} @llvm.experimental.vector.deinterleave2.nxv4f32(<vscale x 4 x float>)
				declare {<vscale x 8 x half>, <vscale x 8 x half>} @llvm.experimental.vector.deinterleave2.nxv16f16(<vscale x 16 x half>)
				declare {<vscale x 4 x float>, <vscale x 4 x float>} @llvm.experimental.vector.deinterleave2.nxv8f32(<vscale x 8 x float>)
				declare {<vscale x 2 x double>, <vscale x 2 x double>} @llvm.experimental.vector.deinterleave2.nxv4f64(<vscale x 4 x double>)

				; Integer declarations
				declare {<vscale x 16 x i8>, <vscale x 16 x i8>} @llvm.experimental.vector.deinterleave2.nxv32i8(<vscale x 32 x i8>)
				declare {<vscale x 8 x i16>, <vscale x 8 x i16>} @llvm.experimental.vector.deinterleave2.nxv16i16(<vscale x 16 x i16>)
				declare {<vscale x 4 x i32>, <vscale x 4 x i32>} @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32>)
				declare {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.experimental.vector.deinterleave2.nxv4i64(<vscale x 4 x i64>)

				; Predicated declarations
				declare {<vscale x 16 x i1>, <vscale x 16 x i1>} @llvm.experimental.vector.deinterleave2.nxv32i1(<vscale x 32 x i1>)
				declare {<vscale x 8 x i1>, <vscale x 8 x i1>} @llvm.experimental.vector.deinterleave2.nxv16i1(<vscale x 16 x i1>)
				declare {<vscale x 4 x i1>, <vscale x 4 x i1>} @llvm.experimental.vector.deinterleave2.nxv8i1(<vscale x 8 x i1>)
				declare {<vscale x 2 x i1>, <vscale x 2 x i1>} @llvm.experimental.vector.deinterleave2.nxv4i1(<vscale x 4 x i1>)

llvm/test/CodeGen/AArch64/sve-vector-interleave.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+sve \| FileCheck %s

				define <vscale x 4 x half> @interleave2_nxv4f16(<vscale x 2 x half> %vec0, <vscale x 2 x half> %vec1) {
				; CHECK-LABEL: interleave2_nxv4f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip2 z2.d, z0.d, z1.d
				; CHECK-NEXT: zip1 z0.d, z0.d, z1.d
				; CHECK-NEXT: uzp1 z0.s, z0.s, z2.s
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half> %vec0, <vscale x 2 x half> %vec1)
				ret <vscale x 4 x half> %retval
				}

				define <vscale x 8 x half> @interleave2_nxv8f16(<vscale x 4 x half> %vec0, <vscale x 4 x half> %vec1) {
				; CHECK-LABEL: interleave2_nxv8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip2 z2.s, z0.s, z1.s
				; CHECK-NEXT: zip1 z0.s, z0.s, z1.s
				; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
				; CHECK-NEXT: ret
				%retval = call <vscale x 8 x half> @llvm.experimental.vector.interleave2.nxv8f16(<vscale x 4 x half> %vec0, <vscale x 4 x half> %vec1)
				ret <vscale x 8 x half> %retval
				}

				define <vscale x 16 x half> @interleave2_nxv16f16(<vscale x 8 x half> %vec0, <vscale x 8 x half> %vec1) {
				; CHECK-LABEL: interleave2_nxv16f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 z2.h, z0.h, z1.h
				; CHECK-NEXT: zip2 z1.h, z0.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 16 x half> @llvm.experimental.vector.interleave2.nxv16f16(<vscale x 8 x half> %vec0, <vscale x 8 x half> %vec1)
				ret <vscale x 16 x half> %retval
				}

				define <vscale x 4 x float> @interleave2_nxv4f32(<vscale x 2 x float> %vec0, <vscale x 2 x float> %vec1) {
				; CHECK-LABEL: interleave2_nxv4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip2 z2.d, z0.d, z1.d
				; CHECK-NEXT: zip1 z0.d, z0.d, z1.d
				; CHECK-NEXT: uzp1 z0.s, z0.s, z2.s
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x float> @llvm.experimental.vector.interleave2.nxv4f32(<vscale x 2 x float> %vec0, <vscale x 2 x float> %vec1)
				ret <vscale x 4 x float> %retval
				}

				define <vscale x 8 x float> @interleave2_nxv8f32(<vscale x 4 x float> %vec0, <vscale x 4 x float> %vec1) {
				; CHECK-LABEL: interleave2_nxv8f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 z2.s, z0.s, z1.s
				; CHECK-NEXT: zip2 z1.s, z0.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 8 x float> @llvm.experimental.vector.interleave2.nxv8f32(<vscale x 4 x float> %vec0, <vscale x 4 x float> %vec1)
				ret <vscale x 8 x float> %retval
				}

				define <vscale x 4 x double> @interleave2_nxv4f64(<vscale x 2 x double> %vec0, <vscale x 2 x double> %vec1) {
				; CHECK-LABEL: interleave2_nxv4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 z2.d, z0.d, z1.d
				; CHECK-NEXT: zip2 z1.d, z0.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x double>@llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double> %vec0, <vscale x 2 x double> %vec1)
				ret <vscale x 4 x double> %retval
				}

				; Integers

				define <vscale x 32 x i8> @interleave2_nxv32i8(<vscale x 16 x i8> %vec0, <vscale x 16 x i8> %vec1) {
				; CHECK-LABEL: interleave2_nxv32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 z2.b, z0.b, z1.b
				; CHECK-NEXT: zip2 z1.b, z0.b, z1.b
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 32 x i8> @llvm.experimental.vector.interleave2.nxv32i8(<vscale x 16 x i8> %vec0, <vscale x 16 x i8> %vec1)
				ret <vscale x 32 x i8> %retval
				}

				define <vscale x 16 x i16> @interleave2_nxv16i16(<vscale x 8 x i16> %vec0, <vscale x 8 x i16> %vec1) {
				; CHECK-LABEL: interleave2_nxv16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 z2.h, z0.h, z1.h
				; CHECK-NEXT: zip2 z1.h, z0.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 16 x i16> @llvm.experimental.vector.interleave2.nxv16i16(<vscale x 8 x i16> %vec0, <vscale x 8 x i16> %vec1)
				ret <vscale x 16 x i16> %retval
				}

				define <vscale x 8 x i32> @interleave2_nxv8i32(<vscale x 4 x i32> %vec0, <vscale x 4 x i32> %vec1) {
				; CHECK-LABEL: interleave2_nxv8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 z2.s, z0.s, z1.s
				; CHECK-NEXT: zip2 z1.s, z0.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> %vec0, <vscale x 4 x i32> %vec1)
				ret <vscale x 8 x i32> %retval
				}

				define <vscale x 4 x i64> @interleave2_nxv4i64(<vscale x 2 x i64> %vec0, <vscale x 2 x i64> %vec1) {
				; CHECK-LABEL: interleave2_nxv4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 z2.d, z0.d, z1.d
				; CHECK-NEXT: zip2 z1.d, z0.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x i64> @llvm.experimental.vector.interleave2.nxv4i64(<vscale x 2 x i64> %vec0, <vscale x 2 x i64> %vec1)
				ret <vscale x 4 x i64> %retval
				}

				; Predicated

				define <vscale x 32 x i1> @interleave2_nxv32i1(<vscale x 16 x i1> %vec0, <vscale x 16 x i1> %vec1) {
				; CHECK-LABEL: interleave2_nxv32i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 p2.b, p0.b, p1.b
				; CHECK-NEXT: zip2 p1.b, p0.b, p1.b
				; CHECK-NEXT: mov p0.b, p2.b
				; CHECK-NEXT: ret
				%retval = call <vscale x 32 x i1> @llvm.experimental.vector.interleave2.nxv32i1(<vscale x 16 x i1> %vec0, <vscale x 16 x i1> %vec1)
				ret <vscale x 32 x i1> %retval
				}

				define <vscale x 16 x i1> @interleave2_nxv16i1(<vscale x 8 x i1> %vec0, <vscale x 8 x i1> %vec1) {
				; CHECK-LABEL: interleave2_nxv16i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip2 p2.h, p0.h, p1.h
				; CHECK-NEXT: zip1 p0.h, p0.h, p1.h
				; CHECK-NEXT: uzp1 p0.b, p0.b, p2.b
				; CHECK-NEXT: ret
				%retval = call <vscale x 16 x i1> @llvm.experimental.vector.interleave2.nxv16i1(<vscale x 8 x i1> %vec0, <vscale x 8 x i1> %vec1)
				ret <vscale x 16 x i1> %retval
				}

				define <vscale x 8 x i1> @interleave2_nxv8i1(<vscale x 4 x i1> %vec0, <vscale x 4 x i1> %vec1) {
				; CHECK-LABEL: interleave2_nxv8i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip2 p2.s, p0.s, p1.s
				; CHECK-NEXT: zip1 p0.s, p0.s, p1.s
				; CHECK-NEXT: uzp1 p0.h, p0.h, p2.h
				; CHECK-NEXT: ret
				%retval = call <vscale x 8 x i1> @llvm.experimental.vector.interleave2.nxv8i1(<vscale x 4 x i1> %vec0, <vscale x 4 x i1> %vec1)
				ret <vscale x 8 x i1> %retval
				}

				define <vscale x 4 x i1> @interleave2_nxv4i1(<vscale x 2 x i1> %vec0, <vscale x 2 x i1> %vec1) {
				; CHECK-LABEL: interleave2_nxv4i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip2 p2.d, p0.d, p1.d
				; CHECK-NEXT: zip1 p0.d, p0.d, p1.d
				; CHECK-NEXT: uzp1 p0.s, p0.s, p2.s
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x i1> @llvm.experimental.vector.interleave2.nxv4i1(<vscale x 2 x i1> %vec0, <vscale x 2 x i1> %vec1)
				ret <vscale x 4 x i1> %retval
				}


				; Float declarations
				declare <vscale x 4 x half> @llvm.experimental.vector.interleave2.nxv4f16(<vscale x 2 x half>, <vscale x 2 x half>)
				declare <vscale x 8 x half> @llvm.experimental.vector.interleave2.nxv8f16(<vscale x 4 x half>, <vscale x 4 x half>)
				declare <vscale x 16 x half> @llvm.experimental.vector.interleave2.nxv16f16(<vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 4 x float> @llvm.experimental.vector.interleave2.nxv4f32(<vscale x 2 x float>, <vscale x 2 x float>)
				declare <vscale x 8 x float> @llvm.experimental.vector.interleave2.nxv8f32(<vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 4 x double> @llvm.experimental.vector.interleave2.nxv4f64(<vscale x 2 x double>, <vscale x 2 x double>)

				; Integer declarations
				declare <vscale x 32 x i8> @llvm.experimental.vector.interleave2.nxv32i8(<vscale x 16 x i8>, <vscale x 16 x i8>)
				declare <vscale x 16 x i16> @llvm.experimental.vector.interleave2.nxv16i16(<vscale x 8 x i16>, <vscale x 8 x i16>)
				declare <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32>, <vscale x 4 x i32>)
				declare <vscale x 4 x i64> @llvm.experimental.vector.interleave2.nxv4i64(<vscale x 2 x i64>, <vscale x 2 x i64>)

				; Predicated
				declare <vscale x 32 x i1> @llvm.experimental.vector.interleave2.nxv32i1(<vscale x 16 x i1>, <vscale x 16 x i1>)
				declare <vscale x 16 x i1> @llvm.experimental.vector.interleave2.nxv16i1(<vscale x 8 x i1>, <vscale x 8 x i1>)
				declare <vscale x 8 x i1> @llvm.experimental.vector.interleave2.nxv8i1(<vscale x 4 x i1>, <vscale x 4 x i1>)
				declare <vscale x 4 x i1> @llvm.experimental.vector.interleave2.nxv4i1(<vscale x 2 x i1>, <vscale x 2 x i1>)

This is an archive of the discontinued LLVM Phabricator instance.

[IR] Add new intrinsics interleave and deinterleave vectorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 498800

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/fixed-vector-deinterleave.ll

llvm/test/CodeGen/AArch64/fixed-vector-interleave.ll

llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll

llvm/test/CodeGen/AArch64/sve-vector-interleave.ll

[IR] Add new intrinsics interleave and deinterleave vectors
ClosedPublic