This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
2/2
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
1/1
ISDOpcodes.h
-
IR/
-
Intrinsics.td
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
LegalizeIntegerTypes.cpp
-
LegalizeTypes.h
-
LegalizeVectorTypes.cpp
1/1
SelectionDAGBuilder.cpp
-
SelectionDAGDumper.cpp
-
IR/
3/3
Verifier.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.h
-
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-extract-evens-vector.ll

Differential D94444

[RFC][Scalable] Add scalable shuffle intrinsic to extract evens from a pair of vectors
Needs ReviewPublic

Authored by cameron.mcinally on Jan 11 2021, 12:51 PM.

Download Raw Diff

Details

Reviewers

efriedma
sdesmalen
rengolin
rsandifo-arm
paulwalker-arm
david-arm

Summary

Here's a proposal for a scalable shuffle intrinsic that extracts the even elements from a pair of vectors. It is the first in the set that was originally discussed in the llvm-dev thread here:

https://lists.llvm.org/pipermail/llvm-dev/2020-January/138762.html

The following are some design decisions I made that could use discussion:

I chose to extract the even elements from a pair of vectors (full vector result), rather than a single vector (1/2 width vector result). This is in line with existing fixed shuffle vectors. And can be extended to accept an undef argument if needed. The motivation behind this decision was that we'd want the result vector to be a full vector for performance reasons. It would also map well to SVE's LD2 and UZP1.

How do we feel about the intrinsic name: llvm.experimental.vector.extract.evens(...)?

How do we feel about the ISDNode name: ISD::EXTRACT_EVENS_VECTOR? It could be argued that this set of nodes should have SCALABLE in their names, unless we plan to also allow fixed width arguments as well. Currently the fixed width intrinsics are canonicalized to the existing shuffle vector implementation, so they never reach this ISDNode.

Has anyone thought through all the legalizations that are valid on scalable vectors? Promote and Split are obviously valid. Scalarize is obviously invalid. How about Widen? Widen conflicts with the existing unpacked scalable vectors, so it's not clear if it's possible to do.

Diff Detail

Event Timeline

cameron.mcinally created this revision.Jan 11 2021, 12:51 PM

Herald added subscribers: dexonsmith, jdoerfert, hiraditya. · View Herald TranscriptJan 11 2021, 12:51 PM

cameron.mcinally requested review of this revision.Jan 11 2021, 12:51 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2021, 12:51 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B84750: Diff 315880.Jan 11 2021, 2:09 PM

Thanks for creating this patch!

I chose to extract the even elements from a pair of vectors (full vector result), rather than a single vector (1/2 width vector result). This is in line with existing fixed shuffle vectors. And can be extended to accept an undef argument if needed. The motivation behind this decision was that we'd want the result vector to be a full vector for performance reasons. It would also map well to SVE's LD2 and UZP1.

Are you also planning to add intrinsics for interleaving?

How do we feel about the intrinsic name: llvm.experimental.vector.extract.evens(...)?

I quite like the odd/even terminology, but would prefer to drop the s, as in: "extracting the even elements".

How do we feel about the ISDNode name: ISD::EXTRACT_EVENS_VECTOR? It could be argued that this set of nodes should have SCALABLE in their names, unless we plan to also allow fixed width arguments as well. Currently the fixed width intrinsics are canonicalized to the existing shuffle vector implementation, so they never reach this ISDNode.

I would favour not to add SCALABLE to that name, because there is nothing limiting these nodes from being used for fixed-width vectors.

Has anyone thought through all the legalizations that are valid on scalable vectors? Promote and Split are obviously valid. Scalarize is obviously invalid. How about Widen? Widen conflicts with the existing unpacked scalable vectors, so it's not clear if it's possible to do.

You're right, I don't think widening is always possible with the current definition. It's not really about how the vectors are laid out in the registers, taking the even elements of <vscale x 3 x i32> means that for each 3 x i32 part you'd need to alternate between selecting the even, odd, even, odd, ... elements. That problem goes away if the intrinsic has the requirement that (at least for scalable vectors) the minimum number of elements needs to be a multiple of 2.

llvm/docs/LangRef.rst
16201	Is it worth adding the clarification that it extracts the even elements from a concatenated vector %vec1:%vec2?
llvm/include/llvm/CodeGen/ISDOpcodes.h
543	Probably the same as above, clarify that it returns a vector containing all even elements of VEC1:VEC2.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6997	nit: SmallVector chooses a default N since D92522 and https://llvm.org/docs/ProgrammersManual.html recommends leaving out N unless there is a well-motivated choice.
llvm/lib/IR/Verifier.cpp
5171–5178	Can these two Assert's be merged into one?

Hi all, this is just a thought and I hope I'm not confusing things further (!), but we could also have something like:

llvm.experimental.vector.deinterleave.even/odd
llvm.experimental.vector.interleave.lo/hi

since we are actually performing deinterleaving operations in this patch and I assume we'll want the matching interleaving ops at some point too? If you wanted to reduce the number of intrinsics, ISD opcodes you could also have the even/odd as a third flag, i.e.

llvm.experimental.vector.deinterleave(<>,<>, i1)

although I'm happy with separate intrinsics/opcodes too!

For what it's worth if we stick with something like llvm.experimental.vector.extract.even(s) I agree with Sander and would prefer to drop the 's' at the end.

In D94444#2492146, @sdesmalen wrote:

Thanks for creating this patch!

I chose to extract the even elements from a pair of vectors (full vector result), rather than a single vector (1/2 width vector result). This is in line with existing fixed shuffle vectors. And can be extended to accept an undef argument if needed. The motivation behind this decision was that we'd want the result vector to be a full vector for performance reasons. It would also map well to SVE's LD2 and UZP1.

Are you also planning to add intrinsics for interleaving?

I am, plus some others for Complex vectorization. I just wanted to work out the kinks with this first example.

How do we feel about the intrinsic name: llvm.experimental.vector.extract.evens(...)?

I quite like the odd/even terminology, but would prefer to drop the s, as in: "extracting the even elements".

I like that too. I also like David's suggestion about deinterleave.even. I don't really have a strong opinion on either though. Does anyone feel strongly either way?

How do we feel about the ISDNode name: ISD::EXTRACT_EVENS_VECTOR? It could be argued that this set of nodes should have SCALABLE in their names, unless we plan to also allow fixed width arguments as well. Currently the fixed width intrinsics are canonicalized to the existing shuffle vector implementation, so they never reach this ISDNode.

I would favour not to add SCALABLE to that name, because there is nothing limiting these nodes from being used for fixed-width vectors.

Ok, that's fair. Right now I have an assert in ISelLowering to ensure only scalable types. That could really be removed though, since the UZP1 lowering would also work for fixed types. It might take a little work to clean up, but I don't foresee any problems.

Address some of @sdesmalen's comments, but deferring name changes...

In D94444#2492855, @cameron.mcinally wrote:

I like that too. I also like David's suggestion about deinterleave.even. I don't really have a strong opinion on either though. Does anyone feel strongly either way?

I quite like Dave's suggestion for deinterleave.even/odd and interleave.lo/hi, because now they are related (antonyms) and the names are still intuitive.

How do we feel about the ISDNode name: ISD::EXTRACT_EVENS_VECTOR? It could be argued that this set of nodes should have SCALABLE in their names, unless we plan to also allow fixed width arguments as well. Currently the fixed width intrinsics are canonicalized to the existing shuffle vector implementation, so they never reach this ISDNode.

I would favour not to add SCALABLE to that name, because there is nothing limiting these nodes from being used for fixed-width vectors.

Ok, that's fair. Right now I have an assert in ISelLowering to ensure only scalable types. That could really be removed though, since the UZP1 lowering would also work for fixed types. It might take a little work to clean up, but I don't foresee any problems.

It's probably fine either with/without the assert, since there is currently no way to test the fixed-width case.

Updated to @david-arm's suggested naming scheme...

sdesmalen added subscribers: evandro, craig.topper, rogfer01.Jan 13 2021, 9:00 AM

sdesmalen added inline comments.Jan 13 2021, 9:26 AM

llvm/docs/LangRef.rst
16206	AIUI, for scalable vectors the minimum number of elements must be a power of two in order to be able to legalize this operation without widening (not just an even number as I suggested earlier). Can you add that as a restriction?
llvm/lib/IR/Verifier.cpp
5176	Can you also check the constraint that the minimum number of elements must be a power of two for scalable vectors?

Add known minimum number of elements restrictions...

llvm/lib/IR/Verifier.cpp
5176	I'm still getting up to speed with ElementCount, so I'm not sure that his is the best way to use it. Any experts?

A bit of a flyby review as I'm still on holidays but to my mind many of the restrictions being proposed for the new intrinsic seem purely down to the design decision of splitting the input vector across two operands. I understand this is how the underlying instructions work for SVE but that does not seem like a good enough reason to compromise the IR.

So my first questions are whether the IR and ISD interfaces need to match and from an IR point of view what is the expected usage? Is having two input operands going to result in the common case of having to "split" the result of a large load. I ask because I recall this being how InterleavedAccess worked with LLVM (i.e. one big load which a set of shuffles to extra the lanes).

My second question is what are the code generation advantages of having multiple operands against the negatives. We know type legalisation is a negative but I'm guessing the advantage is it allows a simpler mapping to the underlying SVE instructions. The question is whether this is worth the cost.

By only having a single input vector I believe the current proposed type restrictions disappear as widening becomes quite easy. The downside is that some of this type legalisation becomes more complex but this feels worth it if that means less compromises. From an SVE point of view it seems pretty easier to rely on common type legalisation until you get to the point where the input vector is twice the size of the legal type at which point we custom lower to the relevant AArch64 specific node, which mirrors how we handle things like ZERO_EXTEND today.

My final question relates to future usages and how the intrinsic's idiom scales. Taking the above InterleavedAccess example, there is a requirement to have a stride other than two, for example pixel data will want three or four. One route is to add an intrinsic for each option but I'm wondering if there's any appetite for a single generic intrinsic of the form:

<A x Elt> llvm.experimental.vector.extract.elements( %invec, i32 index, i32 stride)

Where index and stride are required to be constant immediate values with "stride > 0" and "0 <= index < stride".

If it helps we could also initially restrict the range of stride as this is something that can be easily changed with improved code generation abilities. By this I mean with your current patch we can restrict it to being <=2 and still have distinct ISD nodes for these supported variants if that results in the better implementation.

In D94444#2497697, @paulwalker-arm wrote:

A bit of a flyby review as I'm still on holidays but to my mind many of the restrictions being proposed for the new intrinsic seem purely down to the design decision of splitting the input vector across two operands. I understand this is how the underlying instructions work for SVE but that does not seem like a good enough reason to compromise the IR.

So my first questions are whether the IR and ISD interfaces need to match and from an IR point of view what is the expected usage?

My main IR use case is Complex vectorization. The vector Complex lowerings require vectors of just the reals and/or imags for the intermediate steps.

And also the trivial case of a stride 2 loop.

Is having two input operands going to result in the common case of having to "split" the result of a large load. I ask because I recall this being how InterleavedAccess worked with LLVM (i.e. one big load which a set of shuffles to extra the lanes).

Yeah, I could see where a large load would need to be split. That doesn't seem like too much of a headache though. We're going to do the two loads either way.

The two operand intrinsics are my preferred choice since when we vectorize loops, we want to keep full vectors. We don't want to run the loop 2x times on 1/2 full vectors, or pay the vector concatenation cost in the loop. This does map pretty well to SVE. We either do an LD2 if the operands are from memory and throw away one result, or a UZP if they're in register. Not sure how this would map to RISCV.

If we have one operand intrinsics, we'd need two UZPs for the lo and hi halves, and then a splice. I suppose ISel could combine those two patterns into a two operand UZP though. Unless someone has a better lowering?

The two operand intrinsic could also be extended to accept one undef operand. So there is some flexibility there to get the same one operand intrinsic result.

My second question is what are the code generation advantages of having multiple operands against the negatives. We know type legalisation is a negative but I'm guessing the advantage is it allows a simpler mapping to the underlying SVE instructions. The question is whether this is worth the cost.

By only having a single input vector I believe the current proposed type restrictions disappear as widening becomes quite easy. The downside is that some of this type legalisation becomes more complex but this feels worth it if that means less compromises. From an SVE point of view it seems pretty easier to rely on common type legalisation until you get to the point where the input vector is twice the size of the legal type at which point we custom lower to the relevant AArch64 specific node, which mirrors how we handle things like ZERO_EXTEND today.

I don't have a strong sense for what the trade off are. Maybe you can elaborate once you're back from vacation.

My final question relates to future usages and how the intrinsic's idiom scales. Taking the above InterleavedAccess example, there is a requirement to have a stride other than two, for example pixel data will want three or four. One route is to add an intrinsic for each option but I'm wondering if there's any appetite for a single generic intrinsic of the form:

<A x Elt> llvm.experimental.vector.extract.elements( %invec, i32 index, i32 stride)

Where index and stride are required to be constant immediate values with "stride > 0" and "0 <= index < stride".

If it helps we could also initially restrict the range of stride as this is something that can be easily changed with improved code generation abilities. By this I mean with your current patch we can restrict it to being <=2 and still have distinct ISD nodes for these supported variants if that results in the better implementation.

I like this idea a lot. Essentially a step vector shuffle. You could even roll splats into it with a 0 stride. Implementing it sounds pretty challenging though. Especially for an index >=2. Maybe I'm missing an easy solution, but that sounds like a lot of work to generalize.

Having said that, I wonder if we should revisit the idea of allowing shuffle vectors to accept step vector masks?

cameron.mcinally mentioned this in D94708: [IR] Introduce llvm.experimental.vector.splice intrinsic.Jan 15 2021, 8:27 AM

Matt added a subscriber: Matt.Jan 19 2021, 9:14 AM

Having said that, I wonder if we should revisit the idea of allowing shuffle vectors to accept step vector masks?

At today's sync-up meeting, this met strong resistance. @ctetreau argued that we don't want to allow a stepvector constant as the shuffle vector mask operand, since that could lead to slippery slope of constant initializer strings. Chris, correct my paraphrase if you'd like...

In D94444#2497697, @paulwalker-arm wrote:

<A x Elt> llvm.experimental.vector.extract.elements( %invec, i32 index, i32 stride)

Where index and stride are required to be constant immediate values with "stride > 0" and "0 <= index < stride".

Thinking about this some more, this wouldn't be too bad to implement. I'm okay going this route, unless anyone feels strongly against it...

timsmith78 added a subscriber: timsmith78.Jan 21 2021, 9:23 AM

In D94444#2497697, @paulwalker-arm wrote:
<A x Elt> llvm.experimental.vector.extract.elements( %invec, i32 index, i32 stride)

Sorry for the slow reply. I'm just getting back to looking at this and now notice it is a unary shuffle. I'd like to see this as a binary shuffle. E.g.:

void foo(double res[16], double x[16], std::complex<double> vec[16]) {
  for (int i = 0; i < 16; i++)
    res[i] = x[i] + vec[i].real();
  return;
}

In the general vectorization case, we want to keep the vectors as full as possible on each iteration . I think the Complex part of the loop body should look like:

%lo = load %vec, 0
%hi = load %vec, 64
%reals = extract_elements(%lo, %hi, 0, 2)

And not splicing together two 1/2 width vectors:

%lo = load %vec, 0
%reals_lo = extract_elements(%lo, 0, 2)
%hi = load %vec, 64
%reals_hi = extract_elements(%hi, 0, 2)
%reals = concat(%reals_lo, %reals_hi)

And also not having 2x the loop trips on 1/2 width vectors:

%ld = load %vec, 0
%reals = extract_elements(%ld, 0, 2)

I'm hand-waving over some other obvious optimizations, but I think this illustrates the unary shuffle problem pretty well. Thoughts?

For now I'll just cover the IR side of things as the ISD node discussion raises different points and there's nothing to say they need to match.

If you take your code snippet (although I changed the loop trip count to 1024 to allow vectorisation) and look at the IR emitted by LoopVectorize, you'll see what I was referring to in my previous comment. You end up with the following snippet within vector.body:

%wide.vec = load <4 x double>, <4 x double>* %11, align 8, !tbaa !6
%wide.vec23 = load <4 x double>, <4 x double>* %13, align 8, !tbaa !6
%strided.vec = shufflevector <4 x double> %wide.vec, <4 x double> poison, <2 x i32> <i32 0, i32 2>
%strided.vec24 = shufflevector <4 x double> %wide.vec23, <4 x double> poison, <2 x i32> <i32 0, i32 2>

So today the loop is vectorised using vectors as full as possible, in this case the loop was also unrolled hence the pair of loads and shuffles. Here LoopVectorize simple creates a double length load and a matching shuffle to extract the even lanes. If the loop operated on the imaginary parts then there would also be a shuffle to extra the odd lanes. There's no concatenation or splicing involved and the "large" load it trivial to code generate. For AArch64 there is also the InterleavedAccess pass that knows how to convert this logic to an aarch64.ld2 intrinsic call. This is something we'll want for SVE as well, although with the shufflevector replaced by an intrinsic it'll be simpler for SVE to detect as InterleavedAccess is a tad complicated.

This is why I believe at the IR level we should have an intrinsic that mirrors this type of shuffle and thus one that takes a single vector and extracts elements based on a simple pattern (i.e. odd or even....). Doing so means it'll be a drop in replacement for the existing shufflevector usage, which is the goal. Note that if complex was changed to a three element structure, then LoopVectorize will do the expected thing in creating a triple wide load and create shuffles to extract every third element starting at index 0, 1, or 2 based on the field in question.

Ok, I see where you are coming from now. LoopVectorize is keeping the shuffle result full by widening the the load+shuffle to double wide. LV's double wide choice seems like a weird one, but I suppose if that sequence is codegen'd correctly, then it will work out.

It will be interesting to see how this codegens when the input lives in registers. But again I suppose that ISelLowering could straighten it out if needed.

In D94444#2497697, @paulwalker-arm wrote:
<A x Elt> llvm.experimental.vector.extract.elements( %invec, i32 index, i32 stride)

Working through this now. @paulwalker-arm, have you given any thought to what happens to A below?

<vscale x A x double> llvm.experimental.vector.extract.elements(<vscale x 2 x double> %invec, i32 0, i32 4)

What should A be here? It should really be 0.5, but my understanding is we're not doing fractional VLs. I suppose that we could restrict stride to be >= the known VL.

I also wonder what A should be with odd strides, like 3. Any thoughts on that?

Considering:

<A x Elt> llvm.experimental.vector.extract.elements(<B x Elt> %invec, i32 index, i32 stride)

Suitable restrictions are that "A = B/stride" and that "B % stride == 0". When combined with the original restriction that "0 <= index < stride", I think this nicely joins up our use case of having an intrinsic that effectively extracts field "index" from a vector of structs[1] where each struct has "stride" fields.

[1] invec is not actually a vector of structs at the LLVM level, but logically this is what the vector represents.

There is the option to have no restrictions and allow a use case like:

<2 x Elt> llvm.experimental.vector.extract.elements(<vscale x 4 x Elt> %invec, i32 7, i32 13)

but I honestly think that's needlessly looking for trouble. Besides, this extreme use case has the same interface and so once a restricted variant is available it would be easy enough for others to soften restrictions if they see a genuine reason to do so. The important part is to ensure the intrinsic's return and parameter types are good enough to allow these future use cases, which I believe we've achieved in this instance.

[NOT READY FOR REVIEW]

Today is my last day at Cray/HPE, so taking a mid-development snapshot to be finished later.

This isn't working out to be as general as I'd like it though...

Any use of this intrinsic with a vector result half-width or smaller trips up Legalization/CodeGen. E.g. <vscale x 1 x X> or <vscale x 2 x f32>. I'm not sure if there are reasonable fixes for these problems or not yet.

cameron.mcinally updated this revision to Diff 320210.Jan 29 2021, 1:49 PM

Hi @cameron.mcinally I don't suppose you have any plans to do more work on this in the near future?

frasercrmck added a subscriber: frasercrmck.Aug 20 2021, 8:33 AM

This patch hasn't moved for a long time. Trying to clean up my review list in Phabricator!

Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2023, 3:58 AM

Herald added a subscriber: alextsao1999. · View Herald Transcript

dexonsmith removed a subscriber: dexonsmith.Feb 22 2023, 8:05 AM

@cameron.mcinally : Given D141924 can this patch be abandoned?

paulwalker-arm resigned from this revision.Apr 3 2023, 10:26 AM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

24 lines

include/

llvm/

CodeGen/

ISDOpcodes.h

6 lines

IR/

Intrinsics.td

5 lines

lib/

CodeGen/

SelectionDAG/

LegalizeIntegerTypes.cpp

10 lines

LegalizeTypes.h

2 lines

LegalizeVectorTypes.cpp

17 lines

SelectionDAGBuilder.cpp

25 lines

SelectionDAGDumper.cpp

1 line

IR/

Verifier.cpp

11 lines

Target/

AArch64/

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

17 lines

test/

CodeGen/

AArch64/

sve-extract-evens-vector.ll

220 lines

Diff 316112

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 16,175 Lines • ▼ Show 20 Lines
	vector length of the result type. If the result type is a scalable vector,			vector length of the result type. If the result type is a scalable vector,
	``idx`` is first scaled by the result type's runtime scaling factor. Elements			``idx`` is first scaled by the result type's runtime scaling factor. Elements
	``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector			``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector
	indices. If this condition cannot be determined statically but is false at			indices. If this condition cannot be determined statically but is false at
	runtime, then the result vector is undefined. The ``idx`` parameter must be a			runtime, then the result vector is undefined. The ``idx`` parameter must be a
	vector index constant type (for most targets this will be an integer pointer			vector index constant type (for most targets this will be an integer pointer
	type).			type).

				'``llvm.experimental.vector.extract.evens``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""
				This is an overloaded intrinsic.

				::

				declare <4 x float> @llvm.experimental.vector.extract.evens.v4f32(<vscale x 4 x float> %vec1, <vscale x 4 x float> %vec2)
				declare <2 x double> @llvm.experimental.vector.extract.evens.v2f64(<vscale x 2 x double> %vec1, <vscale x 4 x float> %vec2)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.extract.evens.*``' intrinsic extracts a vector
				of even elements from the concatenated vector ``%vec1:%vec2``. The result type
				matches the type of the input vectors.
				sdesmalenUnsubmitted Done Reply Inline Actions Is it worth adding the clarification that it extracts the even elements from a concatenated vector %vec1:%vec2? sdesmalen: Is it worth adding the clarification that it extracts the even elements from a concatenated…

				Arguments:
				""""""""""

				The arguments to this intrinsic must be vectors of the same type.
				sdesmalenUnsubmitted Done Reply Inline Actions AIUI, for scalable vectors the minimum number of elements must be a power of two in order to be able to legalize this operation without widening (not just an even number as I suggested earlier). Can you add that as a restriction? sdesmalen: AIUI, for scalable vectors the minimum number of elements must be a power of two in order to be…

	Matrix Intrinsics			Matrix Intrinsics
	-----------------			-----------------

	Operations on matrixes requiring shape information (like number of rows/columns			Operations on matrixes requiring shape information (like number of rows/columns
	or the memory layout) can be expressed using the matrix intrinsics. These			or the memory layout) can be expressed using the matrix intrinsics. These
	intrinsics require matrix dimensions to be passed as immediate arguments, and			intrinsics require matrix dimensions to be passed as immediate arguments, and
	matrixes are passed and returned as vectors. This means that for a ``R`` x			matrixes are passed and returned as vectors. This means that for a ``R`` x
	``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the			``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the
	▲ Show 20 Lines • Show All 5,053 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 534 Lines • ▼ Show 20 Lines	enum NodeType {
/// condition cannot be determined statically but is false at runtime, then		/// condition cannot be determined statically but is false at runtime, then
/// the result vector is undefined. The IDX parameter must be a vector index		/// the result vector is undefined. The IDX parameter must be a vector index
/// constant type, which for most targets will be an integer pointer type.		/// constant type, which for most targets will be an integer pointer type.
///		///
/// This operation supports extracting a fixed-width vector from a scalable		/// This operation supports extracting a fixed-width vector from a scalable
/// vector, but not the other way around.		/// vector, but not the other way around.
EXTRACT_SUBVECTOR,		EXTRACT_SUBVECTOR,

		/// EXTRACT_EVENS_VECTOR(VEC1, VEC2) - Returns a vector of all the even
		sdesmalenUnsubmitted Done Reply Inline Actions Probably the same as above, clarify that it returns a vector containing all even elements of VEC1:VEC2. sdesmalen: Probably the same as above, clarify that it returns a vector containing all even elements of…
		/// elements from the concatenated vector VEC1:VEC2. The result vector type
		/// will match the input vector type. Both input vectors must have the same
		/// type.
		EXTRACT_EVENS_VECTOR,

/// VECTOR_SHUFFLE(VEC1, VEC2) - Returns a vector, of the same type as		/// VECTOR_SHUFFLE(VEC1, VEC2) - Returns a vector, of the same type as
/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int		/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int
/// values that indicate which value (or undef) each result element will		/// values that indicate which value (or undef) each result element will
/// get. These constant ints are accessible through the		/// get. These constant ints are accessible through the
/// ShuffleVectorSDNode class. This is quite similar to the Altivec		/// ShuffleVectorSDNode class. This is quite similar to the Altivec
/// 'vperm' instruction, except that the indices must be constants and are		/// 'vperm' instruction, except that the indices must be constants and are
/// in terms of the element size of VEC1/VEC2, not in terms of bytes.		/// in terms of the element size of VEC1/VEC2, not in terms of bytes.
VECTOR_SHUFFLE,		VECTOR_SHUFFLE,
▲ Show 20 Lines • Show All 815 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 1,627 Lines • ▼ Show 20 Lines
	//===---------- Intrinsics to perform subvector insertion/extraction ------===//			//===---------- Intrinsics to perform subvector insertion/extraction ------===//
	def int_experimental_vector_insert : DefaultAttrsIntrinsic<[llvm_anyvector_ty],			def int_experimental_vector_insert : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
	[LLVMMatchType<0>, llvm_anyvector_ty, llvm_i64_ty],			[LLVMMatchType<0>, llvm_anyvector_ty, llvm_i64_ty],
	[IntrNoMem, ImmArg<ArgIndex<2>>]>;			[IntrNoMem, ImmArg<ArgIndex<2>>]>;

	def int_experimental_vector_extract : DefaultAttrsIntrinsic<[llvm_anyvector_ty],			def int_experimental_vector_extract : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
	[llvm_anyvector_ty, llvm_i64_ty],			[llvm_anyvector_ty, llvm_i64_ty],
	[IntrNoMem, ImmArg<ArgIndex<1>>]>;			[IntrNoMem, ImmArg<ArgIndex<1>>]>;
				def int_experimental_vector_extract_evens : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
	//===----------------------------------------------------------------------===//			[LLVMMatchType<0>, LLVMMatchType<0>],
				[IntrNoMem]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Target-specific intrinsics			// Target-specific intrinsics
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	include "llvm/IR/IntrinsicsPowerPC.td"			include "llvm/IR/IntrinsicsPowerPC.td"
	include "llvm/IR/IntrinsicsX86.td"			include "llvm/IR/IntrinsicsX86.td"
	include "llvm/IR/IntrinsicsARM.td"			include "llvm/IR/IntrinsicsARM.td"
	Show All 11 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	#endif
case ISD::SRL: Res = PromoteIntRes_SRL(N); break;		case ISD::SRL: Res = PromoteIntRes_SRL(N); break;
case ISD::TRUNCATE: Res = PromoteIntRes_TRUNCATE(N); break;		case ISD::TRUNCATE: Res = PromoteIntRes_TRUNCATE(N); break;
case ISD::UNDEF: Res = PromoteIntRes_UNDEF(N); break;		case ISD::UNDEF: Res = PromoteIntRes_UNDEF(N); break;
case ISD::VAARG: Res = PromoteIntRes_VAARG(N); break;		case ISD::VAARG: Res = PromoteIntRes_VAARG(N); break;
case ISD::VSCALE: Res = PromoteIntRes_VSCALE(N); break;		case ISD::VSCALE: Res = PromoteIntRes_VSCALE(N); break;

case ISD::EXTRACT_SUBVECTOR:		case ISD::EXTRACT_SUBVECTOR:
Res = PromoteIntRes_EXTRACT_SUBVECTOR(N); break;		Res = PromoteIntRes_EXTRACT_SUBVECTOR(N); break;
		case ISD::EXTRACT_EVENS_VECTOR:
		Res = PromoteIntRes_EXTRACT_EVENS_VECTOR(N); break;
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
Res = PromoteIntRes_VECTOR_SHUFFLE(N); break;		Res = PromoteIntRes_VECTOR_SHUFFLE(N); break;
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
Res = PromoteIntRes_INSERT_VECTOR_ELT(N); break;		Res = PromoteIntRes_INSERT_VECTOR_ELT(N); break;
case ISD::BUILD_VECTOR:		case ISD::BUILD_VECTOR:
Res = PromoteIntRes_BUILD_VECTOR(N); break;		Res = PromoteIntRes_BUILD_VECTOR(N); break;
case ISD::SCALAR_TO_VECTOR:		case ISD::SCALAR_TO_VECTOR:
Res = PromoteIntRes_SCALAR_TO_VECTOR(N); break;		Res = PromoteIntRes_SCALAR_TO_VECTOR(N); break;
▲ Show 20 Lines • Show All 4,536 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != OutNumElems; ++i) {
SDValue Op = DAG.getAnyExtOrTrunc(Ext, dl, NOutVTElem);		SDValue Op = DAG.getAnyExtOrTrunc(Ext, dl, NOutVTElem);
// Insert the converted element to the new vector.		// Insert the converted element to the new vector.
Ops.push_back(Op);		Ops.push_back(Op);
}		}

return DAG.getBuildVector(NOutVT, dl, Ops);		return DAG.getBuildVector(NOutVT, dl, Ops);
}		}

		SDValue DAGTypeLegalizer::PromoteIntRes_EXTRACT_EVENS_VECTOR(SDNode *N) {
		SDLoc dl(N);
		SDValue V0 = GetPromotedInteger(N->getOperand(0));
		SDValue V1 = GetPromotedInteger(N->getOperand(1));
		EVT OutVT = V0.getValueType();

		return DAG.getNode(ISD::EXTRACT_EVENS_VECTOR, dl, OutVT, V0, V1);
		}

SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_SHUFFLE(SDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_SHUFFLE(SDNode *N) {
ShuffleVectorSDNode *SV = cast<ShuffleVectorSDNode>(N);		ShuffleVectorSDNode *SV = cast<ShuffleVectorSDNode>(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc dl(N);		SDLoc dl(N);

ArrayRef<int> NewMask = SV->getMask().slice(0, VT.getVectorNumElements());		ArrayRef<int> NewMask = SV->getMask().slice(0, VT.getVectorNumElements());

▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 292 Lines • ▼ Show 20 Lines	private:
void PromoteIntegerResult(SDNode *N, unsigned ResNo);		void PromoteIntegerResult(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_MERGE_VALUES(SDNode *N, unsigned ResNo);		SDValue PromoteIntRes_MERGE_VALUES(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_AssertSext(SDNode *N);		SDValue PromoteIntRes_AssertSext(SDNode *N);
SDValue PromoteIntRes_AssertZext(SDNode *N);		SDValue PromoteIntRes_AssertZext(SDNode *N);
SDValue PromoteIntRes_Atomic0(AtomicSDNode *N);		SDValue PromoteIntRes_Atomic0(AtomicSDNode *N);
SDValue PromoteIntRes_Atomic1(AtomicSDNode *N);		SDValue PromoteIntRes_Atomic1(AtomicSDNode *N);
SDValue PromoteIntRes_AtomicCmpSwap(AtomicSDNode *N, unsigned ResNo);		SDValue PromoteIntRes_AtomicCmpSwap(AtomicSDNode *N, unsigned ResNo);
SDValue PromoteIntRes_EXTRACT_SUBVECTOR(SDNode *N);		SDValue PromoteIntRes_EXTRACT_SUBVECTOR(SDNode *N);
		SDValue PromoteIntRes_EXTRACT_EVENS_VECTOR(SDNode *N);
SDValue PromoteIntRes_VECTOR_SHUFFLE(SDNode *N);		SDValue PromoteIntRes_VECTOR_SHUFFLE(SDNode *N);
SDValue PromoteIntRes_BUILD_VECTOR(SDNode *N);		SDValue PromoteIntRes_BUILD_VECTOR(SDNode *N);
SDValue PromoteIntRes_SCALAR_TO_VECTOR(SDNode *N);		SDValue PromoteIntRes_SCALAR_TO_VECTOR(SDNode *N);
SDValue PromoteIntRes_SPLAT_VECTOR(SDNode *N);		SDValue PromoteIntRes_SPLAT_VECTOR(SDNode *N);
SDValue PromoteIntRes_EXTEND_VECTOR_INREG(SDNode *N);		SDValue PromoteIntRes_EXTEND_VECTOR_INREG(SDNode *N);
SDValue PromoteIntRes_INSERT_VECTOR_ELT(SDNode *N);		SDValue PromoteIntRes_INSERT_VECTOR_ELT(SDNode *N);
SDValue PromoteIntRes_CONCAT_VECTORS(SDNode *N);		SDValue PromoteIntRes_CONCAT_VECTORS(SDNode *N);
SDValue PromoteIntRes_BITCAST(SDNode *N);		SDValue PromoteIntRes_BITCAST(SDNode *N);
▲ Show 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	private:

void SplitVecRes_FIX(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_FIX(SDNode *N, SDValue &Lo, SDValue &Hi);

void SplitVecRes_BITCAST(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_BITCAST(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_BUILD_VECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_BUILD_VECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_CONCAT_VECTORS(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_CONCAT_VECTORS(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_EXTRACT_SUBVECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_EXTRACT_SUBVECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_INSERT_SUBVECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_INSERT_SUBVECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
		void SplitVecRes_EXTRACT_EVENS_VECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_FPOWI(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_FPOWI(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_FCOPYSIGN(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_FCOPYSIGN(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);		void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
void SplitVecRes_MLOAD(MaskedLoadSDNode *MLD, SDValue &Lo, SDValue &Hi);		void SplitVecRes_MLOAD(MaskedLoadSDNode *MLD, SDValue &Lo, SDValue &Hi);
void SplitVecRes_MGATHER(MaskedGatherSDNode *MGT, SDValue &Lo, SDValue &Hi);		void SplitVecRes_MGATHER(MaskedGatherSDNode *MGT, SDValue &Lo, SDValue &Hi);
void SplitVecRes_ScalarOp(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_ScalarOp(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_SETCC(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_SETCC(SDNode *N, SDValue &Lo, SDValue &Hi);
▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 904 Lines • ▼ Show 20 Lines	#endif
case ISD::SELECT: SplitRes_SELECT(N, Lo, Hi); break;		case ISD::SELECT: SplitRes_SELECT(N, Lo, Hi); break;
case ISD::SELECT_CC: SplitRes_SELECT_CC(N, Lo, Hi); break;		case ISD::SELECT_CC: SplitRes_SELECT_CC(N, Lo, Hi); break;
case ISD::UNDEF: SplitRes_UNDEF(N, Lo, Hi); break;		case ISD::UNDEF: SplitRes_UNDEF(N, Lo, Hi); break;
case ISD::BITCAST: SplitVecRes_BITCAST(N, Lo, Hi); break;		case ISD::BITCAST: SplitVecRes_BITCAST(N, Lo, Hi); break;
case ISD::BUILD_VECTOR: SplitVecRes_BUILD_VECTOR(N, Lo, Hi); break;		case ISD::BUILD_VECTOR: SplitVecRes_BUILD_VECTOR(N, Lo, Hi); break;
case ISD::CONCAT_VECTORS: SplitVecRes_CONCAT_VECTORS(N, Lo, Hi); break;		case ISD::CONCAT_VECTORS: SplitVecRes_CONCAT_VECTORS(N, Lo, Hi); break;
case ISD::EXTRACT_SUBVECTOR: SplitVecRes_EXTRACT_SUBVECTOR(N, Lo, Hi); break;		case ISD::EXTRACT_SUBVECTOR: SplitVecRes_EXTRACT_SUBVECTOR(N, Lo, Hi); break;
case ISD::INSERT_SUBVECTOR: SplitVecRes_INSERT_SUBVECTOR(N, Lo, Hi); break;		case ISD::INSERT_SUBVECTOR: SplitVecRes_INSERT_SUBVECTOR(N, Lo, Hi); break;
		case ISD::EXTRACT_EVENS_VECTOR:
		SplitVecRes_EXTRACT_EVENS_VECTOR(N, Lo, Hi);
		break;
case ISD::FPOWI: SplitVecRes_FPOWI(N, Lo, Hi); break;		case ISD::FPOWI: SplitVecRes_FPOWI(N, Lo, Hi); break;
case ISD::FCOPYSIGN: SplitVecRes_FCOPYSIGN(N, Lo, Hi); break;		case ISD::FCOPYSIGN: SplitVecRes_FCOPYSIGN(N, Lo, Hi); break;
case ISD::INSERT_VECTOR_ELT: SplitVecRes_INSERT_VECTOR_ELT(N, Lo, Hi); break;		case ISD::INSERT_VECTOR_ELT: SplitVecRes_INSERT_VECTOR_ELT(N, Lo, Hi); break;
case ISD::SPLAT_VECTOR:		case ISD::SPLAT_VECTOR:
case ISD::SCALAR_TO_VECTOR:		case ISD::SCALAR_TO_VECTOR:
SplitVecRes_ScalarOp(N, Lo, Hi);		SplitVecRes_ScalarOp(N, Lo, Hi);
break;		break;
case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;		case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
▲ Show 20 Lines • Show All 378 Lines • ▼ Show 20 Lines	void DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR(SDNode *N, SDValue &Lo,
StackPtr =		StackPtr =
DAG.getMemBasePlusOffset(StackPtr, TypeSize::Fixed(IncrementSize), dl);		DAG.getMemBasePlusOffset(StackPtr, TypeSize::Fixed(IncrementSize), dl);

// Load the Hi part from the stack slot.		// Load the Hi part from the stack slot.
Hi = DAG.getLoad(Hi.getValueType(), dl, Store, StackPtr,		Hi = DAG.getLoad(Hi.getValueType(), dl, Store, StackPtr,
PtrInfo.getWithOffset(IncrementSize), SmallestAlign);		PtrInfo.getWithOffset(IncrementSize), SmallestAlign);
}		}

		void DAGTypeLegalizer::SplitVecRes_EXTRACT_EVENS_VECTOR(SDNode *N, SDValue &Lo,
		SDValue &Hi) {
		SDLoc dl(N);
		SDValue Src1Lo, Src1Hi;
		GetSplitVector(N->getOperand(0), Src1Lo, Src1Hi);
		SDValue Src2Lo, Src2Hi;
		GetSplitVector(N->getOperand(1), Src2Lo, Src2Hi);

		Lo = DAG.getNode(ISD::EXTRACT_EVENS_VECTOR, dl, Src1Lo.getValueType(),
		Src1Lo, Src2Lo);
		Hi = DAG.getNode(ISD::EXTRACT_EVENS_VECTOR, dl, Src1Hi.getValueType(),
		Src1Hi, Src2Hi);
		}

void DAGTypeLegalizer::SplitVecRes_FPOWI(SDNode *N, SDValue &Lo,		void DAGTypeLegalizer::SplitVecRes_FPOWI(SDNode *N, SDValue &Lo,
SDValue &Hi) {		SDValue &Hi) {
SDLoc dl(N);		SDLoc dl(N);
GetSplitVector(N->getOperand(0), Lo, Hi);		GetSplitVector(N->getOperand(0), Lo, Hi);
Lo = DAG.getNode(ISD::FPOWI, dl, Lo.getValueType(), Lo, N->getOperand(1));		Lo = DAG.getNode(ISD::FPOWI, dl, Lo.getValueType(), Lo, N->getOperand(1));
Hi = DAG.getNode(ISD::FPOWI, dl, Hi.getValueType(), Hi, N->getOperand(1));		Hi = DAG.getNode(ISD::FPOWI, dl, Hi.getValueType(), Hi, N->getOperand(1));
}		}

▲ Show 20 Lines • Show All 4,180 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,970 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_vector_extract: {

SDValue Vec = getValue(I.getOperand(0));		SDValue Vec = getValue(I.getOperand(0));
SDValue Index = getValue(I.getOperand(1));		SDValue Index = getValue(I.getOperand(1));
EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), I.getType());		EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), I.getType());

setValue(&I, DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResultVT, Vec, Index));		setValue(&I, DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResultVT, Vec, Index));
return;		return;
}		}
		case Intrinsic::experimental_vector_extract_evens: {
		auto DL = getCurSDLoc();

		SDValue Src1 = getValue(I.getOperand(0));
		SDValue Src2 = getValue(I.getOperand(1));
		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());

		if (VT.isScalableVector()) {
		setValue(&I, DAG.getNode(ISD::EXTRACT_EVENS_VECTOR, DL, VT, Src1, Src2));
		return;
		}

		assert(VT.isFixedLengthVector() &&
		"Unexpected scalable vector in vector_extract_evens!");

		// If a FixedLengthVector, canonicalize to a SHUFFLE_VECTOR with even
		// mask indicies.
		unsigned NumElts = VT.getVectorElementCount().getKnownMinValue();
		SmallVector<int> Mask(NumElts, -1);
		sdesmalenUnsubmitted Done Reply Inline Actions nit: SmallVector chooses a default N since D92522 and https://llvm.org/docs/ProgrammersManual.html recommends leaving out N unless there is a well-motivated choice. sdesmalen: nit: SmallVector chooses a default N since D92522 and https://llvm.org/docs/ProgrammersManual.
		for (unsigned i = 0; i < NumElts; ++i)
		Mask[i] = i * 2;

		setValue(&I, DAG.getVectorShuffle(VT, DL, Src1, Src2, Mask));
		return;
		}
}		}
}		}

void SelectionDAGBuilder::visitConstrainedFPIntrinsic(		void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
const ConstrainedFPIntrinsic &FPI) {		const ConstrainedFPIntrinsic &FPI) {
SDLoc sdl = getCurSDLoc();		SDLoc sdl = getCurSDLoc();

const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
▲ Show 20 Lines • Show All 3,816 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 280 Lines • ▼ Show 20 Lines	#endif
case ISD::SELECT: return "select";		case ISD::SELECT: return "select";
case ISD::VSELECT: return "vselect";		case ISD::VSELECT: return "vselect";
case ISD::SELECT_CC: return "select_cc";		case ISD::SELECT_CC: return "select_cc";
case ISD::INSERT_VECTOR_ELT: return "insert_vector_elt";		case ISD::INSERT_VECTOR_ELT: return "insert_vector_elt";
case ISD::EXTRACT_VECTOR_ELT: return "extract_vector_elt";		case ISD::EXTRACT_VECTOR_ELT: return "extract_vector_elt";
case ISD::CONCAT_VECTORS: return "concat_vectors";		case ISD::CONCAT_VECTORS: return "concat_vectors";
case ISD::INSERT_SUBVECTOR: return "insert_subvector";		case ISD::INSERT_SUBVECTOR: return "insert_subvector";
case ISD::EXTRACT_SUBVECTOR: return "extract_subvector";		case ISD::EXTRACT_SUBVECTOR: return "extract_subvector";
		case ISD::EXTRACT_EVENS_VECTOR: return "extract_evens_vector";
case ISD::SCALAR_TO_VECTOR: return "scalar_to_vector";		case ISD::SCALAR_TO_VECTOR: return "scalar_to_vector";
case ISD::VECTOR_SHUFFLE: return "vector_shuffle";		case ISD::VECTOR_SHUFFLE: return "vector_shuffle";
case ISD::SPLAT_VECTOR: return "splat_vector";		case ISD::SPLAT_VECTOR: return "splat_vector";
case ISD::CARRY_FALSE: return "carry_false";		case ISD::CARRY_FALSE: return "carry_false";
case ISD::ADDC: return "addc";		case ISD::ADDC: return "addc";
case ISD::ADDE: return "adde";		case ISD::ADDE: return "adde";
case ISD::ADDCARRY: return "addcarry";		case ISD::ADDCARRY: return "addcarry";
case ISD::SADDO_CARRY: return "saddo_carry";		case ISD::SADDO_CARRY: return "saddo_carry";
▲ Show 20 Lines • Show All 744 Lines • Show Last 20 Lines

llvm/lib/IR/Verifier.cpp

Show First 20 Lines • Show All 5,157 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_vector_extract: {
VectorType *VecTy = cast<VectorType>(Call.getArgOperand(0)->getType());		VectorType *VecTy = cast<VectorType>(Call.getArgOperand(0)->getType());

Assert(ResultTy->getElementType() == VecTy->getElementType(),		Assert(ResultTy->getElementType() == VecTy->getElementType(),
"experimental_vector_extract result must have the same element "		"experimental_vector_extract result must have the same element "
"type as the input vector.",		"type as the input vector.",
&Call);		&Call);
break;		break;
}		}
		case Intrinsic::experimental_vector_extract_evens: {
		VectorType *ResultTy = cast<VectorType>(Call.getType());
		VectorType *Op1Ty = cast<VectorType>(Call.getArgOperand(0)->getType());
		VectorType *Op2Ty = cast<VectorType>(Call.getArgOperand(1)->getType());

		Assert(ResultTy == Op1Ty && Op1Ty == Op2Ty,
		"experimental_vector_extract_evens result must have the same "
		"type as the input vectors.",
		&Call);
		break;
		}
		sdesmalenUnsubmitted Done Reply Inline Actions Can you also check the constraint that the minimum number of elements must be a power of two for scalable vectors? sdesmalen: Can you also check the constraint that the minimum number of elements must be a power of two…
		cameron.mcinallyAuthorUnsubmitted Done Reply Inline Actions I'm still getting up to speed with ElementCount, so I'm not sure that his is the best way to use it. Any experts? cameron.mcinally: I'm still getting up to speed with ElementCount, so I'm not sure that his is the best way to…
};		};
}		}
		sdesmalenUnsubmitted Done Reply Inline Actions Can these two Assert's be merged into one? sdesmalen: Can these two Assert's be merged into one?

/// Carefully grab the subprogram from a local scope.		/// Carefully grab the subprogram from a local scope.
///		///
/// This carefully grabs the subprogram from a local scope, avoiding the		/// This carefully grabs the subprogram from a local scope, avoiding the
/// built-in assertions that would typically fire.		/// built-in assertions that would typically fire.
static DISubprogram getSubprogram(Metadata LocalScope) {		static DISubprogram getSubprogram(Metadata LocalScope) {
if (!LocalScope)		if (!LocalScope)
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 803 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 926 Lines • ▼ Show 20 Lines	private:
SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_LOAD_SUB(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_LOAD_SUB(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_LOAD_AND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_LOAD_AND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerWindowsDYNAMIC_STACKALLOC(SDValue Op, SDValue Chain,		SDValue LowerWindowsDYNAMIC_STACKALLOC(SDValue Op, SDValue Chain,
SDValue &Size,		SDValue &Size,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
		SDValue LowerExtractEvensVector(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSVEStructLoad(unsigned Intrinsic, ArrayRef<SDValue> LoadOps,		SDValue LowerSVEStructLoad(unsigned Intrinsic, ArrayRef<SDValue> LoadOps,
EVT VT, SelectionDAG &DAG, const SDLoc &DL) const;		EVT VT, SelectionDAG &DAG, const SDLoc &DL) const;

SDValue LowerFixedLengthVectorIntDivideToSVE(SDValue Op,		SDValue LowerFixedLengthVectorIntDivideToSVE(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorIntExtendToSVE(SDValue Op,		SDValue LowerFixedLengthVectorIntExtendToSVE(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorLoadToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthVectorLoadToSVE(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,099 Lines • ▼ Show 20 Lines	for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {
setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);
setOperationAction(ISD::VECREDUCE_AND, VT, Custom);		setOperationAction(ISD::VECREDUCE_AND, VT, Custom);
setOperationAction(ISD::VECREDUCE_OR, VT, Custom);		setOperationAction(ISD::VECREDUCE_OR, VT, Custom);
setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);		setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);
setOperationAction(ISD::VECREDUCE_UMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_UMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);
		setOperationAction(ISD::EXTRACT_EVENS_VECTOR, VT, Custom);
}		}

// Illegal unpacked integer vector types.		// Illegal unpacked integer vector types.
for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {		for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
}		}

Show All 13 Lines	for (auto VT : {MVT::nxv16i1, MVT::nxv8i1, MVT::nxv4i1, MVT::nxv2i1}) {
setOperationAction(ISD::UINT_TO_FP, VT, Custom);		setOperationAction(ISD::UINT_TO_FP, VT, Custom);
}		}
}		}

for (auto VT : {MVT::nxv2f16, MVT::nxv4f16, MVT::nxv8f16, MVT::nxv2f32,		for (auto VT : {MVT::nxv2f16, MVT::nxv4f16, MVT::nxv8f16, MVT::nxv2f32,
MVT::nxv4f32, MVT::nxv2f64}) {		MVT::nxv4f32, MVT::nxv2f64}) {
setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);		setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
		setOperationAction(ISD::EXTRACT_EVENS_VECTOR, VT, Custom);
setOperationAction(ISD::MGATHER, VT, Custom);		setOperationAction(ISD::MGATHER, VT, Custom);
setOperationAction(ISD::MSCATTER, VT, Custom);		setOperationAction(ISD::MSCATTER, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::FADD, VT, Custom);		setOperationAction(ISD::FADD, VT, Custom);
setOperationAction(ISD::FDIV, VT, Custom);		setOperationAction(ISD::FDIV, VT, Custom);
setOperationAction(ISD::FMA, VT, Custom);		setOperationAction(ISD::FMA, VT, Custom);
setOperationAction(ISD::FMAXNUM, VT, Custom);		setOperationAction(ISD::FMAXNUM, VT, Custom);
▲ Show 20 Lines • Show All 3,263 Lines • ▼ Show 20 Lines	return LowerToPredicatedOp(Op, DAG, AArch64ISD::BITREVERSE_MERGE_PASSTHRU,
/OverrideNEON=/true);		/OverrideNEON=/true);
case ISD::BSWAP:		case ISD::BSWAP:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::BSWAP_MERGE_PASSTHRU);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::BSWAP_MERGE_PASSTHRU);
case ISD::CTLZ:		case ISD::CTLZ:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU,		return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU,
/OverrideNEON=/true);		/OverrideNEON=/true);
case ISD::CTTZ:		case ISD::CTTZ:
return LowerCTTZ(Op, DAG);		return LowerCTTZ(Op, DAG);
		case ISD::EXTRACT_EVENS_VECTOR:
		return LowerExtractEvensVector(Op, DAG);
}		}
}		}

bool AArch64TargetLowering::mergeStoresAfterLegalization(EVT VT) const {		bool AArch64TargetLowering::mergeStoresAfterLegalization(EVT VT) const {
return !Subtarget->useSVEForFixedLengthVectors();		return !Subtarget->useSVEForFixedLengthVectors();
}		}

bool AArch64TargetLowering::useSVEForFixedLengthVectorVT(		bool AArch64TargetLowering::useSVEForFixedLengthVectorVT(
▲ Show 20 Lines • Show All 2,929 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerShiftLeftParts(SDValue Op,
SDValue LoForNormalShift = DAG.getNode(ISD::SHL, dl, VT, ShOpLo, ShAmt);		SDValue LoForNormalShift = DAG.getNode(ISD::SHL, dl, VT, ShOpLo, ShAmt);
SDValue Lo = DAG.getNode(AArch64ISD::CSEL, dl, VT, LoForBigShift,		SDValue Lo = DAG.getNode(AArch64ISD::CSEL, dl, VT, LoForBigShift,
LoForNormalShift, CCVal, Cmp);		LoForNormalShift, CCVal, Cmp);

SDValue Ops[2] = { Lo, Hi };		SDValue Ops[2] = { Lo, Hi };
return DAG.getMergeValues(Ops, dl);		return DAG.getMergeValues(Ops, dl);
}		}

		SDValue
		AArch64TargetLowering::LowerExtractEvensVector(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc DL(Op);
		EVT VT = Op.getValueType();
		assert(VT.isScalableVector() &&
		"Unexpected fixed length vector in LowerExtractEvensVector!");

		SDValue Src1 = Op.getOperand(0);
		SDValue Src2 = Op.getOperand(1);
		return DAG.getNode(AArch64ISD::UZP1, DL, VT, Src1, Src2);
		}

bool AArch64TargetLowering::isOffsetFoldingLegal(		bool AArch64TargetLowering::isOffsetFoldingLegal(
const GlobalAddressSDNode *GA) const {		const GlobalAddressSDNode *GA) const {
// Offsets are folded in the DAG combine rather than here so that we can		// Offsets are folded in the DAG combine rather than here so that we can
// intelligently choose an offset based on the uses.		// intelligently choose an offset based on the uses.
return false;		return false;
}		}

bool AArch64TargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,		bool AArch64TargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,
▲ Show 20 Lines • Show All 9,767 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-extract-evens-vector.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t \| FileCheck %s --check-prefixes=CHECK
				; RUN: FileCheck --check-prefix=WARN --allow-empty %s < %t

				; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
				; WARN-NOT: warning

				;; Legal integer types

				define <16 x i8> @extract_evens_v16i8(<16 x i8> %vec1, <16 x i8> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 v0.16b, v0.16b, v1.16b
				; CHECK-NEXT: ret
				%retval = call <16 x i8> @llvm.experimental.vector.extract.evens.v16i8(<16 x i8> %vec1, <16 x i8> %vec2)
				ret <16 x i8> %retval
				}

				define <8 x i16> @extract_evens_v8i16(<8 x i16> %vec1, <8 x i16> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 v0.8h, v0.8h, v1.8h
				; CHECK-NEXT: ret
				%retval = call <8 x i16> @llvm.experimental.vector.extract.evens.v8i16(<8 x i16> %vec1, <8 x i16> %vec2)
				ret <8 x i16> %retval
				}

				define <4 x i32> @extract_evens_v4i32(<4 x i32> %vec1, <4 x i32> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 v0.4s, v0.4s, v1.4s
				; CHECK-NEXT: ret
				%retval = call <4 x i32> @llvm.experimental.vector.extract.evens.v4i32(<4 x i32> %vec1, <4 x i32> %vec2)
				ret <4 x i32> %retval
				}

				; NOTE: Uses ZIP1 since it's only a 2 element vector.
				define <2 x i64> @extract_evens_v2i64(<2 x i64> %vec1, <2 x i64> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v0.2d, v0.2d, v1.2d
				; CHECK-NEXT: ret
				%retval = call <2 x i64> @llvm.experimental.vector.extract.evens.v2i64(<2 x i64> %vec1, <2 x i64> %vec2)
				ret <2 x i64> %retval
				}

				define <vscale x 16 x i8> @extract_evens_nxv16i8(<vscale x 16 x i8> %vec1, <vscale x 16 x i8> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_nxv16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
				; CHECK-NEXT: ret
				%retval = call <vscale x 16 x i8> @llvm.experimental.vector.extract.evens.nxv16i8(<vscale x 16 x i8> %vec1, <vscale x 16 x i8> %vec2)
				ret <vscale x 16 x i8> %retval
				}

				define <vscale x 8 x i16> @extract_evens_nxv8i16(<vscale x 8 x i16> %vec1, <vscale x 8 x i16> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_nxv8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				%retval = call <vscale x 8 x i16> @llvm.experimental.vector.extract.evens.nxv8i16(<vscale x 8 x i16> %vec1, <vscale x 8 x i16> %vec2)
				ret <vscale x 8 x i16> %retval
				}

				define <vscale x 4 x i32> @extract_evens_nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_nxv4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x i32> @llvm.experimental.vector.extract.evens.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2)
				ret <vscale x 4 x i32> %retval
				}

				define <vscale x 2 x i64> @extract_evens_nxv2i64(<vscale x 2 x i64> %vec1, <vscale x 2 x i64> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_nxv2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.d, z0.d, z1.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 2 x i64> @llvm.experimental.vector.extract.evens.nxv2i64(<vscale x 2 x i64> %vec1, <vscale x 2 x i64> %vec2)
				ret <vscale x 2 x i64> %retval
				}

				;; Illegal integer types

				define <vscale x 2 x i32> @extract_evens_nxv2i32_promote(<vscale x 2 x i32> %vec1, <vscale x 2 x i32> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_nxv2i32_promote:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.d, z0.d, z1.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 2 x i32> @llvm.experimental.vector.extract.evens.nxv2i32(<vscale x 2 x i32> %vec1, <vscale x 2 x i32> %vec2)
				ret <vscale x 2 x i32> %retval
				}

				define <vscale x 4 x i64> @extract_evens_nxv4i64_split(<vscale x 4 x i64> %vec1, <vscale x 4 x i64> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_nxv4i64_split:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.d, z0.d, z2.d
				; CHECK-NEXT: uzp1 z1.d, z1.d, z3.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x i64> @llvm.experimental.vector.extract.evens.nxv4i64(<vscale x 4 x i64> %vec1, <vscale x 4 x i64> %vec2)
				ret <vscale x 4 x i64> %retval
				}


				;; Legal floating point types

				define <8 x half> @extract_evens_v8f16(<8 x half> %vec1, <8 x half> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_v8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 v0.8h, v0.8h, v1.8h
				; CHECK-NEXT: ret
				%retval = call <8 x half> @llvm.experimental.vector.extract.evens.v8f16(<8 x half> %vec1, <8 x half> %vec2)
				ret <8 x half> %retval
				}

				; NOTE: Uses ZIP1 since it's only a 2 element vector.
				define <2 x float> @extract_evens_v2f32(<2 x float> %vec1, <2 x float> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_v2f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v0.2s, v0.2s, v1.2s
				; CHECK-NEXT: ret
				%retval = call <2 x float> @llvm.experimental.vector.extract.evens.v2f32(<2 x float> %vec1, <2 x float> %vec2)
				ret <2 x float> %retval
				}

				define <4 x float> @extract_evens_v4f32(<4 x float> %vec1, <4 x float> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_v4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 v0.4s, v0.4s, v1.4s
				; CHECK-NEXT: ret
				%retval = call <4 x float> @llvm.experimental.vector.extract.evens.v4f32(<4 x float> %vec1, <4 x float> %vec2)
				ret <4 x float> %retval
				}

				; NOTE: Uses ZIP1 since it's only a 2 element vector.
				define <2 x double> @extract_evens_v2f64(<2 x double> %vec1, <2 x double> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_v2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: zip1 v0.2d, v0.2d, v1.2d
				; CHECK-NEXT: ret
				%retval = call <2 x double> @llvm.experimental.vector.extract.evens.v2f64(<2 x double> %vec1, <2 x double> %vec2)
				ret <2 x double> %retval
				}

				define <vscale x 8 x half> @extract_evens_nxv8f16(<vscale x 8 x half> %vec1, <vscale x 8 x half> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_nxv8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				%retval = call <vscale x 8 x half> @llvm.experimental.vector.extract.evens.nxv8f16(<vscale x 8 x half> %vec1, <vscale x 8 x half> %vec2)
				ret <vscale x 8 x half> %retval
				}

				define <vscale x 2 x float> @extract_evens_nxv2f32(<vscale x 2 x float> %vec1, <vscale x 2 x float> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_nxv2f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.d, z0.d, z1.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 2 x float> @llvm.experimental.vector.extract.evens.nxv2f32(<vscale x 2 x float> %vec1, <vscale x 2 x float> %vec2)
				ret <vscale x 2 x float> %retval
				}

				define <vscale x 4 x float> @extract_evens_nxv4f32(<vscale x 4 x float> %vec1, <vscale x 4 x float> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_nxv4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x float> @llvm.experimental.vector.extract.evens.nxv4f32(<vscale x 4 x float> %vec1, <vscale x 4 x float> %vec2)
				ret <vscale x 4 x float> %retval
				}

				define <vscale x 2 x double> @extract_evens_nxv2f64(<vscale x 2 x double> %vec1, <vscale x 2 x double> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_nxv2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.d, z0.d, z1.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 2 x double> @llvm.experimental.vector.extract.evens.nxv2f64(<vscale x 2 x double> %vec1, <vscale x 2 x double> %vec2)
				ret <vscale x 2 x double> %retval
				}

				;; Illegal floating point types

				define <vscale x 4 x double> @extract_evens_nxv4f64_split(<vscale x 4 x double> %vec1, <vscale x 4 x double> %vec2) nounwind {
				; CHECK-LABEL: extract_evens_nxv4f64_split:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.d, z0.d, z2.d
				; CHECK-NEXT: uzp1 z1.d, z1.d, z3.d
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x double> @llvm.experimental.vector.extract.evens.nxv4f64(<vscale x 4 x double> %vec1, <vscale x 4 x double> %vec2)
				ret <vscale x 4 x double> %retval
				}


				; Legal integer declarations
				declare <16 x i8> @llvm.experimental.vector.extract.evens.v16i8(<16 x i8>, <16 x i8>)
				declare <8 x i16> @llvm.experimental.vector.extract.evens.v8i16(<8 x i16>, <8 x i16>)
				declare <4 x i32> @llvm.experimental.vector.extract.evens.v4i32(<4 x i32>, <4 x i32>)
				declare <2 x i64> @llvm.experimental.vector.extract.evens.v2i64(<2 x i64>, <2 x i64>)
				declare <vscale x 16 x i8> @llvm.experimental.vector.extract.evens.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>)
				declare <vscale x 8 x i16> @llvm.experimental.vector.extract.evens.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>)
				declare <vscale x 4 x i32> @llvm.experimental.vector.extract.evens.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>)
				declare <vscale x 2 x i64> @llvm.experimental.vector.extract.evens.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>)

				; Illegal integer declarations
				declare <vscale x 2 x i32> @llvm.experimental.vector.extract.evens.nxv2i32(<vscale x 2 x i32>, <vscale x 2 x i32>)
				declare <vscale x 4 x i64> @llvm.experimental.vector.extract.evens.nxv4i64(<vscale x 4 x i64>, <vscale x 4 x i64>)

				; Legal floating point declarations
				declare <8 x half> @llvm.experimental.vector.extract.evens.v8f16(<8 x half>, <8 x half>)
				declare <2 x float> @llvm.experimental.vector.extract.evens.v2f32(<2 x float>, <2 x float>)
				declare <4 x float> @llvm.experimental.vector.extract.evens.v4f32(<4 x float>, <4 x float>)
				declare <2 x double> @llvm.experimental.vector.extract.evens.v2f64(<2 x double>, <2 x double>)
				declare <vscale x 8 x half> @llvm.experimental.vector.extract.evens.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 2 x float> @llvm.experimental.vector.extract.evens.nxv2f32(<vscale x 2 x float>, <vscale x 2 x float>)
				declare <vscale x 4 x float> @llvm.experimental.vector.extract.evens.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 2 x double> @llvm.experimental.vector.extract.evens.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>)

				; Illegal floating point declarations
				declare <vscale x 4 x double> @llvm.experimental.vector.extract.evens.nxv4f64(<vscale x 4 x double>, <vscale x 4 x double>)

This is an archive of the discontinued LLVM Phabricator instance.

[RFC][Scalable] Add scalable shuffle intrinsic to extract evens from a pair of vectorsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 316112

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

llvm/lib/IR/Verifier.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-extract-evens-vector.ll

[RFC][Scalable] Add scalable shuffle intrinsic to extract evens from a pair of vectors
Needs ReviewPublic