This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
2/2
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
1/1
ISDOpcodes.h
-
IR/
-
Intrinsics.td
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
LegalizeTypes.h
-
LegalizeVectorTypes.cpp
1/1
SelectionDAGBuilder.cpp
-
SelectionDAGDumper.cpp
-
IR/
3/3
Verifier.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.h
-
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-vector-extract-elements.ll

Differential D94444

[RFC][Scalable] Add scalable shuffle intrinsic to extract evens from a pair of vectors
Needs ReviewPublic

Authored by cameron.mcinally on Jan 11 2021, 12:51 PM.

Download Raw Diff

Details

Reviewers

efriedma
sdesmalen
rengolin
rsandifo-arm
paulwalker-arm
david-arm

Summary

Here's a proposal for a scalable shuffle intrinsic that extracts the even elements from a pair of vectors. It is the first in the set that was originally discussed in the llvm-dev thread here:

https://lists.llvm.org/pipermail/llvm-dev/2020-January/138762.html

The following are some design decisions I made that could use discussion:

I chose to extract the even elements from a pair of vectors (full vector result), rather than a single vector (1/2 width vector result). This is in line with existing fixed shuffle vectors. And can be extended to accept an undef argument if needed. The motivation behind this decision was that we'd want the result vector to be a full vector for performance reasons. It would also map well to SVE's LD2 and UZP1.

How do we feel about the intrinsic name: llvm.experimental.vector.extract.evens(...)?

How do we feel about the ISDNode name: ISD::EXTRACT_EVENS_VECTOR? It could be argued that this set of nodes should have SCALABLE in their names, unless we plan to also allow fixed width arguments as well. Currently the fixed width intrinsics are canonicalized to the existing shuffle vector implementation, so they never reach this ISDNode.

Has anyone thought through all the legalizations that are valid on scalable vectors? Promote and Split are obviously valid. Scalarize is obviously invalid. How about Widen? Widen conflicts with the existing unpacked scalable vectors, so it's not clear if it's possible to do.

Diff Detail

Event Timeline

cameron.mcinally created this revision.Jan 11 2021, 12:51 PM

Herald added subscribers: dexonsmith, jdoerfert, hiraditya. · View Herald TranscriptJan 11 2021, 12:51 PM

cameron.mcinally requested review of this revision.Jan 11 2021, 12:51 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2021, 12:51 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B84750: Diff 315880.Jan 11 2021, 2:09 PM

Thanks for creating this patch!

I chose to extract the even elements from a pair of vectors (full vector result), rather than a single vector (1/2 width vector result). This is in line with existing fixed shuffle vectors. And can be extended to accept an undef argument if needed. The motivation behind this decision was that we'd want the result vector to be a full vector for performance reasons. It would also map well to SVE's LD2 and UZP1.

Are you also planning to add intrinsics for interleaving?

How do we feel about the intrinsic name: llvm.experimental.vector.extract.evens(...)?

I quite like the odd/even terminology, but would prefer to drop the s, as in: "extracting the even elements".

How do we feel about the ISDNode name: ISD::EXTRACT_EVENS_VECTOR? It could be argued that this set of nodes should have SCALABLE in their names, unless we plan to also allow fixed width arguments as well. Currently the fixed width intrinsics are canonicalized to the existing shuffle vector implementation, so they never reach this ISDNode.

I would favour not to add SCALABLE to that name, because there is nothing limiting these nodes from being used for fixed-width vectors.

Has anyone thought through all the legalizations that are valid on scalable vectors? Promote and Split are obviously valid. Scalarize is obviously invalid. How about Widen? Widen conflicts with the existing unpacked scalable vectors, so it's not clear if it's possible to do.

You're right, I don't think widening is always possible with the current definition. It's not really about how the vectors are laid out in the registers, taking the even elements of <vscale x 3 x i32> means that for each 3 x i32 part you'd need to alternate between selecting the even, odd, even, odd, ... elements. That problem goes away if the intrinsic has the requirement that (at least for scalable vectors) the minimum number of elements needs to be a multiple of 2.

llvm/docs/LangRef.rst
16201	Is it worth adding the clarification that it extracts the even elements from a concatenated vector %vec1:%vec2?
llvm/include/llvm/CodeGen/ISDOpcodes.h
543	Probably the same as above, clarify that it returns a vector containing all even elements of VEC1:VEC2.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6997	nit: SmallVector chooses a default N since D92522 and https://llvm.org/docs/ProgrammersManual.html recommends leaving out N unless there is a well-motivated choice.
llvm/lib/IR/Verifier.cpp
5171–5178	Can these two Assert's be merged into one?

Hi all, this is just a thought and I hope I'm not confusing things further (!), but we could also have something like:

llvm.experimental.vector.deinterleave.even/odd
llvm.experimental.vector.interleave.lo/hi

since we are actually performing deinterleaving operations in this patch and I assume we'll want the matching interleaving ops at some point too? If you wanted to reduce the number of intrinsics, ISD opcodes you could also have the even/odd as a third flag, i.e.

llvm.experimental.vector.deinterleave(<>,<>, i1)

although I'm happy with separate intrinsics/opcodes too!

For what it's worth if we stick with something like llvm.experimental.vector.extract.even(s) I agree with Sander and would prefer to drop the 's' at the end.

In D94444#2492146, @sdesmalen wrote:

Thanks for creating this patch!

I chose to extract the even elements from a pair of vectors (full vector result), rather than a single vector (1/2 width vector result). This is in line with existing fixed shuffle vectors. And can be extended to accept an undef argument if needed. The motivation behind this decision was that we'd want the result vector to be a full vector for performance reasons. It would also map well to SVE's LD2 and UZP1.

Are you also planning to add intrinsics for interleaving?

I am, plus some others for Complex vectorization. I just wanted to work out the kinks with this first example.

How do we feel about the intrinsic name: llvm.experimental.vector.extract.evens(...)?

I quite like the odd/even terminology, but would prefer to drop the s, as in: "extracting the even elements".

I like that too. I also like David's suggestion about deinterleave.even. I don't really have a strong opinion on either though. Does anyone feel strongly either way?

How do we feel about the ISDNode name: ISD::EXTRACT_EVENS_VECTOR? It could be argued that this set of nodes should have SCALABLE in their names, unless we plan to also allow fixed width arguments as well. Currently the fixed width intrinsics are canonicalized to the existing shuffle vector implementation, so they never reach this ISDNode.

I would favour not to add SCALABLE to that name, because there is nothing limiting these nodes from being used for fixed-width vectors.

Ok, that's fair. Right now I have an assert in ISelLowering to ensure only scalable types. That could really be removed though, since the UZP1 lowering would also work for fixed types. It might take a little work to clean up, but I don't foresee any problems.

Address some of @sdesmalen's comments, but deferring name changes...

In D94444#2492855, @cameron.mcinally wrote:

I like that too. I also like David's suggestion about deinterleave.even. I don't really have a strong opinion on either though. Does anyone feel strongly either way?

I quite like Dave's suggestion for deinterleave.even/odd and interleave.lo/hi, because now they are related (antonyms) and the names are still intuitive.

How do we feel about the ISDNode name: ISD::EXTRACT_EVENS_VECTOR? It could be argued that this set of nodes should have SCALABLE in their names, unless we plan to also allow fixed width arguments as well. Currently the fixed width intrinsics are canonicalized to the existing shuffle vector implementation, so they never reach this ISDNode.

I would favour not to add SCALABLE to that name, because there is nothing limiting these nodes from being used for fixed-width vectors.

Ok, that's fair. Right now I have an assert in ISelLowering to ensure only scalable types. That could really be removed though, since the UZP1 lowering would also work for fixed types. It might take a little work to clean up, but I don't foresee any problems.

It's probably fine either with/without the assert, since there is currently no way to test the fixed-width case.

Updated to @david-arm's suggested naming scheme...

sdesmalen added subscribers: evandro, craig.topper, rogfer01.Jan 13 2021, 9:00 AM

sdesmalen added inline comments.Jan 13 2021, 9:26 AM

llvm/docs/LangRef.rst
16206	AIUI, for scalable vectors the minimum number of elements must be a power of two in order to be able to legalize this operation without widening (not just an even number as I suggested earlier). Can you add that as a restriction?
llvm/lib/IR/Verifier.cpp
5176	Can you also check the constraint that the minimum number of elements must be a power of two for scalable vectors?

Add known minimum number of elements restrictions...

llvm/lib/IR/Verifier.cpp
5176	I'm still getting up to speed with ElementCount, so I'm not sure that his is the best way to use it. Any experts?

A bit of a flyby review as I'm still on holidays but to my mind many of the restrictions being proposed for the new intrinsic seem purely down to the design decision of splitting the input vector across two operands. I understand this is how the underlying instructions work for SVE but that does not seem like a good enough reason to compromise the IR.

So my first questions are whether the IR and ISD interfaces need to match and from an IR point of view what is the expected usage? Is having two input operands going to result in the common case of having to "split" the result of a large load. I ask because I recall this being how InterleavedAccess worked with LLVM (i.e. one big load which a set of shuffles to extra the lanes).

My second question is what are the code generation advantages of having multiple operands against the negatives. We know type legalisation is a negative but I'm guessing the advantage is it allows a simpler mapping to the underlying SVE instructions. The question is whether this is worth the cost.

By only having a single input vector I believe the current proposed type restrictions disappear as widening becomes quite easy. The downside is that some of this type legalisation becomes more complex but this feels worth it if that means less compromises. From an SVE point of view it seems pretty easier to rely on common type legalisation until you get to the point where the input vector is twice the size of the legal type at which point we custom lower to the relevant AArch64 specific node, which mirrors how we handle things like ZERO_EXTEND today.

My final question relates to future usages and how the intrinsic's idiom scales. Taking the above InterleavedAccess example, there is a requirement to have a stride other than two, for example pixel data will want three or four. One route is to add an intrinsic for each option but I'm wondering if there's any appetite for a single generic intrinsic of the form:

<A x Elt> llvm.experimental.vector.extract.elements( %invec, i32 index, i32 stride)

Where index and stride are required to be constant immediate values with "stride > 0" and "0 <= index < stride".

If it helps we could also initially restrict the range of stride as this is something that can be easily changed with improved code generation abilities. By this I mean with your current patch we can restrict it to being <=2 and still have distinct ISD nodes for these supported variants if that results in the better implementation.

In D94444#2497697, @paulwalker-arm wrote:

A bit of a flyby review as I'm still on holidays but to my mind many of the restrictions being proposed for the new intrinsic seem purely down to the design decision of splitting the input vector across two operands. I understand this is how the underlying instructions work for SVE but that does not seem like a good enough reason to compromise the IR.

So my first questions are whether the IR and ISD interfaces need to match and from an IR point of view what is the expected usage?

My main IR use case is Complex vectorization. The vector Complex lowerings require vectors of just the reals and/or imags for the intermediate steps.

And also the trivial case of a stride 2 loop.

Is having two input operands going to result in the common case of having to "split" the result of a large load. I ask because I recall this being how InterleavedAccess worked with LLVM (i.e. one big load which a set of shuffles to extra the lanes).

Yeah, I could see where a large load would need to be split. That doesn't seem like too much of a headache though. We're going to do the two loads either way.

The two operand intrinsics are my preferred choice since when we vectorize loops, we want to keep full vectors. We don't want to run the loop 2x times on 1/2 full vectors, or pay the vector concatenation cost in the loop. This does map pretty well to SVE. We either do an LD2 if the operands are from memory and throw away one result, or a UZP if they're in register. Not sure how this would map to RISCV.

If we have one operand intrinsics, we'd need two UZPs for the lo and hi halves, and then a splice. I suppose ISel could combine those two patterns into a two operand UZP though. Unless someone has a better lowering?

The two operand intrinsic could also be extended to accept one undef operand. So there is some flexibility there to get the same one operand intrinsic result.

My second question is what are the code generation advantages of having multiple operands against the negatives. We know type legalisation is a negative but I'm guessing the advantage is it allows a simpler mapping to the underlying SVE instructions. The question is whether this is worth the cost.

By only having a single input vector I believe the current proposed type restrictions disappear as widening becomes quite easy. The downside is that some of this type legalisation becomes more complex but this feels worth it if that means less compromises. From an SVE point of view it seems pretty easier to rely on common type legalisation until you get to the point where the input vector is twice the size of the legal type at which point we custom lower to the relevant AArch64 specific node, which mirrors how we handle things like ZERO_EXTEND today.

I don't have a strong sense for what the trade off are. Maybe you can elaborate once you're back from vacation.

My final question relates to future usages and how the intrinsic's idiom scales. Taking the above InterleavedAccess example, there is a requirement to have a stride other than two, for example pixel data will want three or four. One route is to add an intrinsic for each option but I'm wondering if there's any appetite for a single generic intrinsic of the form:

<A x Elt> llvm.experimental.vector.extract.elements( %invec, i32 index, i32 stride)

Where index and stride are required to be constant immediate values with "stride > 0" and "0 <= index < stride".

If it helps we could also initially restrict the range of stride as this is something that can be easily changed with improved code generation abilities. By this I mean with your current patch we can restrict it to being <=2 and still have distinct ISD nodes for these supported variants if that results in the better implementation.

I like this idea a lot. Essentially a step vector shuffle. You could even roll splats into it with a 0 stride. Implementing it sounds pretty challenging though. Especially for an index >=2. Maybe I'm missing an easy solution, but that sounds like a lot of work to generalize.

Having said that, I wonder if we should revisit the idea of allowing shuffle vectors to accept step vector masks?

cameron.mcinally mentioned this in D94708: [IR] Introduce llvm.experimental.vector.splice intrinsic.Jan 15 2021, 8:27 AM

Matt added a subscriber: Matt.Jan 19 2021, 9:14 AM

Having said that, I wonder if we should revisit the idea of allowing shuffle vectors to accept step vector masks?

At today's sync-up meeting, this met strong resistance. @ctetreau argued that we don't want to allow a stepvector constant as the shuffle vector mask operand, since that could lead to slippery slope of constant initializer strings. Chris, correct my paraphrase if you'd like...

In D94444#2497697, @paulwalker-arm wrote:

<A x Elt> llvm.experimental.vector.extract.elements( %invec, i32 index, i32 stride)

Where index and stride are required to be constant immediate values with "stride > 0" and "0 <= index < stride".

Thinking about this some more, this wouldn't be too bad to implement. I'm okay going this route, unless anyone feels strongly against it...

timsmith78 added a subscriber: timsmith78.Jan 21 2021, 9:23 AM

In D94444#2497697, @paulwalker-arm wrote:
<A x Elt> llvm.experimental.vector.extract.elements( %invec, i32 index, i32 stride)

Sorry for the slow reply. I'm just getting back to looking at this and now notice it is a unary shuffle. I'd like to see this as a binary shuffle. E.g.:

void foo(double res[16], double x[16], std::complex<double> vec[16]) {
  for (int i = 0; i < 16; i++)
    res[i] = x[i] + vec[i].real();
  return;
}

In the general vectorization case, we want to keep the vectors as full as possible on each iteration . I think the Complex part of the loop body should look like:

%lo = load %vec, 0
%hi = load %vec, 64
%reals = extract_elements(%lo, %hi, 0, 2)

And not splicing together two 1/2 width vectors:

%lo = load %vec, 0
%reals_lo = extract_elements(%lo, 0, 2)
%hi = load %vec, 64
%reals_hi = extract_elements(%hi, 0, 2)
%reals = concat(%reals_lo, %reals_hi)

And also not having 2x the loop trips on 1/2 width vectors:

%ld = load %vec, 0
%reals = extract_elements(%ld, 0, 2)

I'm hand-waving over some other obvious optimizations, but I think this illustrates the unary shuffle problem pretty well. Thoughts?

For now I'll just cover the IR side of things as the ISD node discussion raises different points and there's nothing to say they need to match.

If you take your code snippet (although I changed the loop trip count to 1024 to allow vectorisation) and look at the IR emitted by LoopVectorize, you'll see what I was referring to in my previous comment. You end up with the following snippet within vector.body:

%wide.vec = load <4 x double>, <4 x double>* %11, align 8, !tbaa !6
%wide.vec23 = load <4 x double>, <4 x double>* %13, align 8, !tbaa !6
%strided.vec = shufflevector <4 x double> %wide.vec, <4 x double> poison, <2 x i32> <i32 0, i32 2>
%strided.vec24 = shufflevector <4 x double> %wide.vec23, <4 x double> poison, <2 x i32> <i32 0, i32 2>

So today the loop is vectorised using vectors as full as possible, in this case the loop was also unrolled hence the pair of loads and shuffles. Here LoopVectorize simple creates a double length load and a matching shuffle to extract the even lanes. If the loop operated on the imaginary parts then there would also be a shuffle to extra the odd lanes. There's no concatenation or splicing involved and the "large" load it trivial to code generate. For AArch64 there is also the InterleavedAccess pass that knows how to convert this logic to an aarch64.ld2 intrinsic call. This is something we'll want for SVE as well, although with the shufflevector replaced by an intrinsic it'll be simpler for SVE to detect as InterleavedAccess is a tad complicated.

This is why I believe at the IR level we should have an intrinsic that mirrors this type of shuffle and thus one that takes a single vector and extracts elements based on a simple pattern (i.e. odd or even....). Doing so means it'll be a drop in replacement for the existing shufflevector usage, which is the goal. Note that if complex was changed to a three element structure, then LoopVectorize will do the expected thing in creating a triple wide load and create shuffles to extract every third element starting at index 0, 1, or 2 based on the field in question.

Ok, I see where you are coming from now. LoopVectorize is keeping the shuffle result full by widening the the load+shuffle to double wide. LV's double wide choice seems like a weird one, but I suppose if that sequence is codegen'd correctly, then it will work out.

It will be interesting to see how this codegens when the input lives in registers. But again I suppose that ISelLowering could straighten it out if needed.

In D94444#2497697, @paulwalker-arm wrote:
<A x Elt> llvm.experimental.vector.extract.elements( %invec, i32 index, i32 stride)

Working through this now. @paulwalker-arm, have you given any thought to what happens to A below?

<vscale x A x double> llvm.experimental.vector.extract.elements(<vscale x 2 x double> %invec, i32 0, i32 4)

What should A be here? It should really be 0.5, but my understanding is we're not doing fractional VLs. I suppose that we could restrict stride to be >= the known VL.

I also wonder what A should be with odd strides, like 3. Any thoughts on that?

Considering:

<A x Elt> llvm.experimental.vector.extract.elements(<B x Elt> %invec, i32 index, i32 stride)

Suitable restrictions are that "A = B/stride" and that "B % stride == 0". When combined with the original restriction that "0 <= index < stride", I think this nicely joins up our use case of having an intrinsic that effectively extracts field "index" from a vector of structs[1] where each struct has "stride" fields.

[1] invec is not actually a vector of structs at the LLVM level, but logically this is what the vector represents.

There is the option to have no restrictions and allow a use case like:

<2 x Elt> llvm.experimental.vector.extract.elements(<vscale x 4 x Elt> %invec, i32 7, i32 13)

but I honestly think that's needlessly looking for trouble. Besides, this extreme use case has the same interface and so once a restricted variant is available it would be easy enough for others to soften restrictions if they see a genuine reason to do so. The important part is to ensure the intrinsic's return and parameter types are good enough to allow these future use cases, which I believe we've achieved in this instance.

[NOT READY FOR REVIEW]

Today is my last day at Cray/HPE, so taking a mid-development snapshot to be finished later.

This isn't working out to be as general as I'd like it though...

Any use of this intrinsic with a vector result half-width or smaller trips up Legalization/CodeGen. E.g. <vscale x 1 x X> or <vscale x 2 x f32>. I'm not sure if there are reasonable fixes for these problems or not yet.

cameron.mcinally updated this revision to Diff 320210.Jan 29 2021, 1:49 PM

Hi @cameron.mcinally I don't suppose you have any plans to do more work on this in the near future?

frasercrmck added a subscriber: frasercrmck.Aug 20 2021, 8:33 AM

This patch hasn't moved for a long time. Trying to clean up my review list in Phabricator!

Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2023, 3:58 AM

Herald added a subscriber: alextsao1999. · View Herald Transcript

dexonsmith removed a subscriber: dexonsmith.Feb 22 2023, 8:05 AM

@cameron.mcinally : Given D141924 can this patch be abandoned?

paulwalker-arm resigned from this revision.Apr 3 2023, 10:26 AM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

37 lines

include/

llvm/

CodeGen/

ISDOpcodes.h

7 lines

IR/

Intrinsics.td

8 lines

lib/

CodeGen/

SelectionDAG/

LegalizeTypes.h

2 lines

LegalizeVectorTypes.cpp

48 lines

SelectionDAGBuilder.cpp

46 lines

SelectionDAGDumper.cpp

1 line

IR/

Verifier.cpp

26 lines

Target/

AArch64/

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

63 lines

test/

CodeGen/

AArch64/

sve-vector-extract-elements.ll

114 lines

Diff 320210

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 16,175 Lines • ▼ Show 20 Lines
	vector length of the result type. If the result type is a scalable vector,			vector length of the result type. If the result type is a scalable vector,
	``idx`` is first scaled by the result type's runtime scaling factor. Elements			``idx`` is first scaled by the result type's runtime scaling factor. Elements
	``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector			``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector
	indices. If this condition cannot be determined statically but is false at			indices. If this condition cannot be determined statically but is false at
	runtime, then the result vector is undefined. The ``idx`` parameter must be a			runtime, then the result vector is undefined. The ``idx`` parameter must be a
	vector index constant type (for most targets this will be an integer pointer			vector index constant type (for most targets this will be an integer pointer
	type).			type).

				'``llvm.experimental.vector.extract.elements``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""
				This is an overloaded intrinsic.

				::

				declare <A x float> @llvm.experimental.vector.extract.elements.v4f32(<vscale x 4 x float> %vec, i32 %index, i32 %stride)
				declare <A x double> @llvm.experimental.vector.extract.elements.v2f64(<vscale x 2 x double> %vec, i32 %index, i32 %stride)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.extract.elements.*``' intrinsic extracts a
				strided subvector of elements from the vector ``%vec``. The vector
				length of the result type is determined by the vector length of ``%vec`` divided by ``%stride``.
				sdesmalenUnsubmitted Done Reply Inline Actions Is it worth adding the clarification that it extracts the even elements from a concatenated vector %vec1:%vec2? sdesmalen: Is it worth adding the clarification that it extracts the even elements from a concatenated…
				The underlying primitive data type of the result type must match the underlying
				primitive data type of the input vector ``%vec``

				Arguments:
				""""""""""
				sdesmalenUnsubmitted Done Reply Inline Actions AIUI, for scalable vectors the minimum number of elements must be a power of two in order to be able to legalize this operation without widening (not just an even number as I suggested earlier). Can you add that as a restriction? sdesmalen: AIUI, for scalable vectors the minimum number of elements must be a power of two in order to be…

				The first argument to this intrinsic must be a vector. If the first argument is
				a scalable vector, the minimum known number of elements must be a power of two.

				The second argument to this intrinsic is a positive constant integer start
				index. The start index is currently restricted to 0 or 1.

				The third argument to this intrinsic is a positive constant integer stride,
				where every ``%stride``-th element will be extracted from the input vector.
				The stride must be greater than or equal to the start index and must also be a
				multiple of the vector length of ``%vec``.



	Matrix Intrinsics			Matrix Intrinsics
	-----------------			-----------------

	Operations on matrixes requiring shape information (like number of rows/columns			Operations on matrixes requiring shape information (like number of rows/columns
	or the memory layout) can be expressed using the matrix intrinsics. These			or the memory layout) can be expressed using the matrix intrinsics. These
	intrinsics require matrix dimensions to be passed as immediate arguments, and			intrinsics require matrix dimensions to be passed as immediate arguments, and
	matrixes are passed and returned as vectors. This means that for a ``R`` x			matrixes are passed and returned as vectors. This means that for a ``R`` x
	``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the			``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the
	▲ Show 20 Lines • Show All 5,053 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 534 Lines • ▼ Show 20 Lines	enum NodeType {
/// condition cannot be determined statically but is false at runtime, then		/// condition cannot be determined statically but is false at runtime, then
/// the result vector is undefined. The IDX parameter must be a vector index		/// the result vector is undefined. The IDX parameter must be a vector index
/// constant type, which for most targets will be an integer pointer type.		/// constant type, which for most targets will be an integer pointer type.
///		///
/// This operation supports extracting a fixed-width vector from a scalable		/// This operation supports extracting a fixed-width vector from a scalable
/// vector, but not the other way around.		/// vector, but not the other way around.
EXTRACT_SUBVECTOR,		EXTRACT_SUBVECTOR,

		/// VECTOR_EXTRACT_ELEMENTS(VEC, IDX, N) - Returns a subvector of the N-th
		sdesmalenUnsubmitted Done Reply Inline Actions Probably the same as above, clarify that it returns a vector containing all even elements of VEC1:VEC2. sdesmalen: Probably the same as above, clarify that it returns a vector containing all even elements of…
		/// elements from the vector VEC, starting from index IDX. The result vector
		/// type's primitive data type must match VEC's primitive data type. The
		/// result type's vector length must match the input vector type's vector
		/// length divided by STRIDE.
		VECTOR_EXTRACT_ELEMENTS,

/// VECTOR_SHUFFLE(VEC1, VEC2) - Returns a vector, of the same type as		/// VECTOR_SHUFFLE(VEC1, VEC2) - Returns a vector, of the same type as
/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int		/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int
/// values that indicate which value (or undef) each result element will		/// values that indicate which value (or undef) each result element will
/// get. These constant ints are accessible through the		/// get. These constant ints are accessible through the
/// ShuffleVectorSDNode class. This is quite similar to the Altivec		/// ShuffleVectorSDNode class. This is quite similar to the Altivec
/// 'vperm' instruction, except that the indices must be constants and are		/// 'vperm' instruction, except that the indices must be constants and are
/// in terms of the element size of VEC1/VEC2, not in terms of bytes.		/// in terms of the element size of VEC1/VEC2, not in terms of bytes.
VECTOR_SHUFFLE,		VECTOR_SHUFFLE,
▲ Show 20 Lines • Show All 815 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 1,627 Lines • ▼ Show 20 Lines
	//===---------- Intrinsics to perform subvector insertion/extraction ------===//			//===---------- Intrinsics to perform subvector insertion/extraction ------===//
	def int_experimental_vector_insert : DefaultAttrsIntrinsic<[llvm_anyvector_ty],			def int_experimental_vector_insert : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
	[LLVMMatchType<0>, llvm_anyvector_ty, llvm_i64_ty],			[LLVMMatchType<0>, llvm_anyvector_ty, llvm_i64_ty],
	[IntrNoMem, ImmArg<ArgIndex<2>>]>;			[IntrNoMem, ImmArg<ArgIndex<2>>]>;

	def int_experimental_vector_extract : DefaultAttrsIntrinsic<[llvm_anyvector_ty],			def int_experimental_vector_extract : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
	[llvm_anyvector_ty, llvm_i64_ty],			[llvm_anyvector_ty, llvm_i64_ty],
	[IntrNoMem, ImmArg<ArgIndex<1>>]>;			[IntrNoMem, ImmArg<ArgIndex<1>>]>;
				def int_experimental_vector_extract_elements : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
	//===----------------------------------------------------------------------===//			[llvm_anyvector_ty,
				llvm_i32_ty, llvm_i32_ty],
				[IntrNoMem,
				ImmArg<ArgIndex<1>>,
				ImmArg<ArgIndex<2>>]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Target-specific intrinsics			// Target-specific intrinsics
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	include "llvm/IR/IntrinsicsPowerPC.td"			include "llvm/IR/IntrinsicsPowerPC.td"
	include "llvm/IR/IntrinsicsX86.td"			include "llvm/IR/IntrinsicsX86.td"
	include "llvm/IR/IntrinsicsARM.td"			include "llvm/IR/IntrinsicsARM.td"
	Show All 11 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 819 Lines • ▼ Show 20 Lines	private:

void SplitVecRes_FIX(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_FIX(SDNode *N, SDValue &Lo, SDValue &Hi);

void SplitVecRes_BITCAST(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_BITCAST(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_BUILD_VECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_BUILD_VECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_CONCAT_VECTORS(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_CONCAT_VECTORS(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_EXTRACT_SUBVECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_EXTRACT_SUBVECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_INSERT_SUBVECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_INSERT_SUBVECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
		void SplitVecRes_VECTOR_EXTRACT_ELEMENTS(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_FPOWI(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_FPOWI(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_FCOPYSIGN(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_FCOPYSIGN(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);		void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
void SplitVecRes_MLOAD(MaskedLoadSDNode *MLD, SDValue &Lo, SDValue &Hi);		void SplitVecRes_MLOAD(MaskedLoadSDNode *MLD, SDValue &Lo, SDValue &Hi);
void SplitVecRes_MGATHER(MaskedGatherSDNode *MGT, SDValue &Lo, SDValue &Hi);		void SplitVecRes_MGATHER(MaskedGatherSDNode *MGT, SDValue &Lo, SDValue &Hi);
void SplitVecRes_ScalarOp(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_ScalarOp(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_SETCC(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_SETCC(SDNode *N, SDValue &Lo, SDValue &Hi);
Show All 15 Lines	private:
SDValue SplitVecOp_EXTRACT_SUBVECTOR(SDNode *N);		SDValue SplitVecOp_EXTRACT_SUBVECTOR(SDNode *N);
SDValue SplitVecOp_EXTRACT_VECTOR_ELT(SDNode *N);		SDValue SplitVecOp_EXTRACT_VECTOR_ELT(SDNode *N);
SDValue SplitVecOp_ExtVecInRegOp(SDNode *N);		SDValue SplitVecOp_ExtVecInRegOp(SDNode *N);
SDValue SplitVecOp_STORE(StoreSDNode *N, unsigned OpNo);		SDValue SplitVecOp_STORE(StoreSDNode *N, unsigned OpNo);
SDValue SplitVecOp_MSTORE(MaskedStoreSDNode *N, unsigned OpNo);		SDValue SplitVecOp_MSTORE(MaskedStoreSDNode *N, unsigned OpNo);
SDValue SplitVecOp_MSCATTER(MaskedScatterSDNode *N, unsigned OpNo);		SDValue SplitVecOp_MSCATTER(MaskedScatterSDNode *N, unsigned OpNo);
SDValue SplitVecOp_MGATHER(MaskedGatherSDNode *MGT, unsigned OpNo);		SDValue SplitVecOp_MGATHER(MaskedGatherSDNode *MGT, unsigned OpNo);
SDValue SplitVecOp_CONCAT_VECTORS(SDNode *N);		SDValue SplitVecOp_CONCAT_VECTORS(SDNode *N);
		SDValue SplitVecOp_VECTOR_EXTRACT_ELEMENTS(SDNode *N);
SDValue SplitVecOp_VSETCC(SDNode *N);		SDValue SplitVecOp_VSETCC(SDNode *N);
SDValue SplitVecOp_FP_ROUND(SDNode *N);		SDValue SplitVecOp_FP_ROUND(SDNode *N);
SDValue SplitVecOp_FCOPYSIGN(SDNode *N);		SDValue SplitVecOp_FCOPYSIGN(SDNode *N);
SDValue SplitVecOp_FP_TO_XINT_SAT(SDNode *N);		SDValue SplitVecOp_FP_TO_XINT_SAT(SDNode *N);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Vector Widening Support: LegalizeVectorTypes.cpp		// Vector Widening Support: LegalizeVectorTypes.cpp
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 904 Lines • ▼ Show 20 Lines	#endif
case ISD::SELECT: SplitRes_SELECT(N, Lo, Hi); break;		case ISD::SELECT: SplitRes_SELECT(N, Lo, Hi); break;
case ISD::SELECT_CC: SplitRes_SELECT_CC(N, Lo, Hi); break;		case ISD::SELECT_CC: SplitRes_SELECT_CC(N, Lo, Hi); break;
case ISD::UNDEF: SplitRes_UNDEF(N, Lo, Hi); break;		case ISD::UNDEF: SplitRes_UNDEF(N, Lo, Hi); break;
case ISD::BITCAST: SplitVecRes_BITCAST(N, Lo, Hi); break;		case ISD::BITCAST: SplitVecRes_BITCAST(N, Lo, Hi); break;
case ISD::BUILD_VECTOR: SplitVecRes_BUILD_VECTOR(N, Lo, Hi); break;		case ISD::BUILD_VECTOR: SplitVecRes_BUILD_VECTOR(N, Lo, Hi); break;
case ISD::CONCAT_VECTORS: SplitVecRes_CONCAT_VECTORS(N, Lo, Hi); break;		case ISD::CONCAT_VECTORS: SplitVecRes_CONCAT_VECTORS(N, Lo, Hi); break;
case ISD::EXTRACT_SUBVECTOR: SplitVecRes_EXTRACT_SUBVECTOR(N, Lo, Hi); break;		case ISD::EXTRACT_SUBVECTOR: SplitVecRes_EXTRACT_SUBVECTOR(N, Lo, Hi); break;
case ISD::INSERT_SUBVECTOR: SplitVecRes_INSERT_SUBVECTOR(N, Lo, Hi); break;		case ISD::INSERT_SUBVECTOR: SplitVecRes_INSERT_SUBVECTOR(N, Lo, Hi); break;
		case ISD::VECTOR_EXTRACT_ELEMENTS:
		SplitVecRes_VECTOR_EXTRACT_ELEMENTS(N, Lo, Hi);
		break;
case ISD::FPOWI: SplitVecRes_FPOWI(N, Lo, Hi); break;		case ISD::FPOWI: SplitVecRes_FPOWI(N, Lo, Hi); break;
case ISD::FCOPYSIGN: SplitVecRes_FCOPYSIGN(N, Lo, Hi); break;		case ISD::FCOPYSIGN: SplitVecRes_FCOPYSIGN(N, Lo, Hi); break;
case ISD::INSERT_VECTOR_ELT: SplitVecRes_INSERT_VECTOR_ELT(N, Lo, Hi); break;		case ISD::INSERT_VECTOR_ELT: SplitVecRes_INSERT_VECTOR_ELT(N, Lo, Hi); break;
case ISD::SPLAT_VECTOR:		case ISD::SPLAT_VECTOR:
case ISD::SCALAR_TO_VECTOR:		case ISD::SCALAR_TO_VECTOR:
SplitVecRes_ScalarOp(N, Lo, Hi);		SplitVecRes_ScalarOp(N, Lo, Hi);
break;		break;
case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;		case ISD::SIGN_EXTEND_INREG: SplitVecRes_InregOp(N, Lo, Hi); break;
▲ Show 20 Lines • Show All 378 Lines • ▼ Show 20 Lines	void DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR(SDNode *N, SDValue &Lo,
StackPtr =		StackPtr =
DAG.getMemBasePlusOffset(StackPtr, TypeSize::Fixed(IncrementSize), dl);		DAG.getMemBasePlusOffset(StackPtr, TypeSize::Fixed(IncrementSize), dl);

// Load the Hi part from the stack slot.		// Load the Hi part from the stack slot.
Hi = DAG.getLoad(Hi.getValueType(), dl, Store, StackPtr,		Hi = DAG.getLoad(Hi.getValueType(), dl, Store, StackPtr,
PtrInfo.getWithOffset(IncrementSize), SmallestAlign);		PtrInfo.getWithOffset(IncrementSize), SmallestAlign);
}		}

		void DAGTypeLegalizer::SplitVecRes_VECTOR_EXTRACT_ELEMENTS(SDNode *N,
		SDValue &Lo,
		SDValue &Hi) {
		SDLoc dl(N);
		SDValue SrcLo, SrcHi;
		GetSplitVector(N->getOperand(0), SrcLo, SrcHi);
		SDValue Idx = N->getOperand(1);
		SDValue Stride = N->getOperand(2);

		// FIXME: This only works it Idx is restricted to 0\|1. Will need updating if
		// we plan lift that restriction.
		Lo = DAG.getNode(ISD::VECTOR_EXTRACT_ELEMENTS, dl, SrcLo.getValueType(),
		SrcLo, Idx, Stride);
		Hi = DAG.getNode(ISD::VECTOR_EXTRACT_ELEMENTS, dl, SrcHi.getValueType(),
		SrcHi, Idx, Stride);
		}

void DAGTypeLegalizer::SplitVecRes_FPOWI(SDNode *N, SDValue &Lo,		void DAGTypeLegalizer::SplitVecRes_FPOWI(SDNode *N, SDValue &Lo,
SDValue &Hi) {		SDValue &Hi) {
SDLoc dl(N);		SDLoc dl(N);
GetSplitVector(N->getOperand(0), Lo, Hi);		GetSplitVector(N->getOperand(0), Lo, Hi);
Lo = DAG.getNode(ISD::FPOWI, dl, Lo.getValueType(), Lo, N->getOperand(1));		Lo = DAG.getNode(ISD::FPOWI, dl, Lo.getValueType(), Lo, N->getOperand(1));
Hi = DAG.getNode(ISD::FPOWI, dl, Hi.getValueType(), Hi, N->getOperand(1));		Hi = DAG.getNode(ISD::FPOWI, dl, Hi.getValueType(), Hi, N->getOperand(1));
}		}

▲ Show 20 Lines • Show All 788 Lines • ▼ Show 20 Lines	report_fatal_error("Do not know how to split this operator's "
"operand!\n");		"operand!\n");

case ISD::SETCC: Res = SplitVecOp_VSETCC(N); break;		case ISD::SETCC: Res = SplitVecOp_VSETCC(N); break;
case ISD::BITCAST: Res = SplitVecOp_BITCAST(N); break;		case ISD::BITCAST: Res = SplitVecOp_BITCAST(N); break;
case ISD::EXTRACT_SUBVECTOR: Res = SplitVecOp_EXTRACT_SUBVECTOR(N); break;		case ISD::EXTRACT_SUBVECTOR: Res = SplitVecOp_EXTRACT_SUBVECTOR(N); break;
case ISD::INSERT_SUBVECTOR: Res = SplitVecOp_INSERT_SUBVECTOR(N, OpNo); break;		case ISD::INSERT_SUBVECTOR: Res = SplitVecOp_INSERT_SUBVECTOR(N, OpNo); break;
case ISD::EXTRACT_VECTOR_ELT:Res = SplitVecOp_EXTRACT_VECTOR_ELT(N); break;		case ISD::EXTRACT_VECTOR_ELT:Res = SplitVecOp_EXTRACT_VECTOR_ELT(N); break;
case ISD::CONCAT_VECTORS: Res = SplitVecOp_CONCAT_VECTORS(N); break;		case ISD::CONCAT_VECTORS: Res = SplitVecOp_CONCAT_VECTORS(N); break;
		case ISD::VECTOR_EXTRACT_ELEMENTS:
		Res = SplitVecOp_VECTOR_EXTRACT_ELEMENTS(N);
		break;
case ISD::TRUNCATE:		case ISD::TRUNCATE:
Res = SplitVecOp_TruncateHelper(N);		Res = SplitVecOp_TruncateHelper(N);
break;		break;
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
case ISD::FP_ROUND: Res = SplitVecOp_FP_ROUND(N); break;		case ISD::FP_ROUND: Res = SplitVecOp_FP_ROUND(N); break;
case ISD::FCOPYSIGN: Res = SplitVecOp_FCOPYSIGN(N); break;		case ISD::FCOPYSIGN: Res = SplitVecOp_FCOPYSIGN(N); break;
case ISD::STORE:		case ISD::STORE:
Res = SplitVecOp_STORE(cast<StoreSDNode>(N), OpNo);		Res = SplitVecOp_STORE(cast<StoreSDNode>(N), OpNo);
▲ Show 20 Lines • Show All 599 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = Op.getValueType().getVectorNumElements();
Elts.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, EltVT, Op,		Elts.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, EltVT, Op,
DAG.getVectorIdxConstant(i, DL)));		DAG.getVectorIdxConstant(i, DL)));
}		}
}		}

return DAG.getBuildVector(N->getValueType(0), DL, Elts);		return DAG.getBuildVector(N->getValueType(0), DL, Elts);
}		}

		SDValue DAGTypeLegalizer::SplitVecOp_VECTOR_EXTRACT_ELEMENTS(SDNode *N) {
		SDLoc DL(N);

		EVT ResVT = N->getValueType(0);

		SDValue Vec = N->getOperand(0);
		SDValue Idx = N->getOperand(1);
		SDValue Stride = N->getOperand(2);
		SDLoc dl(N);

		// FIXME: This only works if we restrict Idx to 0\|1. This will need updating
		// if we lift that restriction.
		SDValue Lo, Hi;
		GetSplitVector(Vec, Lo, Hi);

		SDValue ResLo = DAG.getNode(ISD::VECTOR_EXTRACT_ELEMENTS, dl, ResVT, Lo,
		Idx, Stride);
		SDValue ResHi = DAG.getNode(ISD::VECTOR_EXTRACT_ELEMENTS, dl, ResVT, Hi,
		Idx, Stride);

		// FIXME: CONCAT_VECTORS won't work on Scalable, I don't think. Need a splice
		// or similar.
		return DAG.getNode(ISD::CONCAT_VECTORS, dl, ResVT, ResLo, ResHi);
		}

SDValue DAGTypeLegalizer::SplitVecOp_TruncateHelper(SDNode *N) {		SDValue DAGTypeLegalizer::SplitVecOp_TruncateHelper(SDNode *N) {
// The result type is legal, but the input type is illegal. If splitting		// The result type is legal, but the input type is illegal. If splitting
// ends up with the result type of each half still being legal, just		// ends up with the result type of each half still being legal, just
// do that. If, however, that would result in an illegal result type,		// do that. If, however, that would result in an illegal result type,
// we can try to get more clever with power-two vectors. Specifically,		// we can try to get more clever with power-two vectors. Specifically,
// split the input type, but also widen the result element size, then		// split the input type, but also widen the result element size, then
// concatenate the halves and truncate again. For example, consider a target		// concatenate the halves and truncate again. For example, consider a target
// where v8i8 is legal and v8i32 is not (ARM, which doesn't have 256-bit		// where v8i8 is legal and v8i32 is not (ARM, which doesn't have 256-bit
▲ Show 20 Lines • Show All 2,761 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,970 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_vector_extract: {

SDValue Vec = getValue(I.getOperand(0));		SDValue Vec = getValue(I.getOperand(0));
SDValue Index = getValue(I.getOperand(1));		SDValue Index = getValue(I.getOperand(1));
EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), I.getType());		EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), I.getType());

setValue(&I, DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResultVT, Vec, Index));		setValue(&I, DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResultVT, Vec, Index));
return;		return;
}		}
		case Intrinsic::experimental_vector_extract_elements: {
		auto DL = getCurSDLoc();

		SDValue Src = getValue(I.getOperand(0));
		SDValue Idx = getValue(I.getOperand(1));
		SDValue Stride = getValue(I.getOperand(2));
		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
		EVT InVT = Src.getValueType();

		if (VT.isScalableVector()) {
		SDValue Res = DAG.getNode(ISD::VECTOR_EXTRACT_ELEMENTS,
		DL, VT, Src, Idx, Stride);
		setValue(&I, Res);
		return;
		}

		assert(InVT.isFixedLengthVector() &&
		"Unexpected scalable vector in vector_extract_evens!");

		sdesmalenUnsubmitted Done Reply Inline Actions nit: SmallVector chooses a default N since D92522 and https://llvm.org/docs/ProgrammersManual.html recommends leaving out N unless there is a well-motivated choice. sdesmalen: nit: SmallVector chooses a default N since D92522 and https://llvm.org/docs/ProgrammersManual.
		// If a FixedLengthVector, canonicalize to a SHUFFLE_VECTOR with a strided
		// mask.
		ConstantSDNode *CIdx = dyn_cast<ConstantSDNode>(Idx);
		ConstantSDNode *CStride = dyn_cast<ConstantSDNode>(Stride);
		assert(CIdx && CStride &&
		"Expected an immediate argument in vector_extract_evens!");

		ElementCount InEC = InVT.getVectorElementCount();
		unsigned InNumElts = InEC.getKnownMinValue();
		ElementCount ResEC = VT.getVectorElementCount();
		unsigned ResNumElts = ResEC.getKnownMinValue();

		uint64_t idx = CIdx->getZExtValue();
		uint64_t stride = CStride->getZExtValue();
		SmallVector<int> Mask(InNumElts, -1);
		for (unsigned i = 0; i < ResNumElts; ++i)
		Mask[i] = i*stride + idx;

		SDValue Res = DAG.getVectorShuffle(InVT, DL, Src, DAG.getUNDEF(InVT), Mask);

		if (ResNumElts != InNumElts)
		Res = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, Res,
		DAG.getVectorIdxConstant(0, DL));

		setValue(&I, Res);
		return;
		}
}		}
}		}

void SelectionDAGBuilder::visitConstrainedFPIntrinsic(		void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
const ConstrainedFPIntrinsic &FPI) {		const ConstrainedFPIntrinsic &FPI) {
SDLoc sdl = getCurSDLoc();		SDLoc sdl = getCurSDLoc();

const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
▲ Show 20 Lines • Show All 3,816 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 280 Lines • ▼ Show 20 Lines	#endif
case ISD::SELECT: return "select";		case ISD::SELECT: return "select";
case ISD::VSELECT: return "vselect";		case ISD::VSELECT: return "vselect";
case ISD::SELECT_CC: return "select_cc";		case ISD::SELECT_CC: return "select_cc";
case ISD::INSERT_VECTOR_ELT: return "insert_vector_elt";		case ISD::INSERT_VECTOR_ELT: return "insert_vector_elt";
case ISD::EXTRACT_VECTOR_ELT: return "extract_vector_elt";		case ISD::EXTRACT_VECTOR_ELT: return "extract_vector_elt";
case ISD::CONCAT_VECTORS: return "concat_vectors";		case ISD::CONCAT_VECTORS: return "concat_vectors";
case ISD::INSERT_SUBVECTOR: return "insert_subvector";		case ISD::INSERT_SUBVECTOR: return "insert_subvector";
case ISD::EXTRACT_SUBVECTOR: return "extract_subvector";		case ISD::EXTRACT_SUBVECTOR: return "extract_subvector";
		case ISD::VECTOR_EXTRACT_ELEMENTS: return "extract_elements";
case ISD::SCALAR_TO_VECTOR: return "scalar_to_vector";		case ISD::SCALAR_TO_VECTOR: return "scalar_to_vector";
case ISD::VECTOR_SHUFFLE: return "vector_shuffle";		case ISD::VECTOR_SHUFFLE: return "vector_shuffle";
case ISD::SPLAT_VECTOR: return "splat_vector";		case ISD::SPLAT_VECTOR: return "splat_vector";
case ISD::CARRY_FALSE: return "carry_false";		case ISD::CARRY_FALSE: return "carry_false";
case ISD::ADDC: return "addc";		case ISD::ADDC: return "addc";
case ISD::ADDE: return "adde";		case ISD::ADDE: return "adde";
case ISD::ADDCARRY: return "addcarry";		case ISD::ADDCARRY: return "addcarry";
case ISD::SADDO_CARRY: return "saddo_carry";		case ISD::SADDO_CARRY: return "saddo_carry";
▲ Show 20 Lines • Show All 744 Lines • Show Last 20 Lines

llvm/lib/IR/Verifier.cpp

Show First 20 Lines • Show All 5,157 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_vector_extract: {
VectorType *VecTy = cast<VectorType>(Call.getArgOperand(0)->getType());		VectorType *VecTy = cast<VectorType>(Call.getArgOperand(0)->getType());

Assert(ResultTy->getElementType() == VecTy->getElementType(),		Assert(ResultTy->getElementType() == VecTy->getElementType(),
"experimental_vector_extract result must have the same element "		"experimental_vector_extract result must have the same element "
"type as the input vector.",		"type as the input vector.",
&Call);		&Call);
break;		break;
}		}
		case Intrinsic::experimental_vector_extract_elements: {
		VectorType *ResTy = cast<VectorType>(Call.getType());
		VectorType *VecTy = cast<VectorType>(Call.getArgOperand(0)->getType());
		ConstantInt *CIdx = dyn_cast<ConstantInt>(Call.getArgOperand(1));
		ConstantInt *CStride = dyn_cast<ConstantInt>(Call.getArgOperand(2));
		ElementCount ResEC = ResTy->getElementCount();
		ElementCount VecEC = VecTy->getElementCount();

		Assert(CIdx && CStride &&
		0 <= CIdx->getZExtValue() &&
		CIdx->getZExtValue() < CStride->getZExtValue(),
		sdesmalenUnsubmitted Done Reply Inline Actions Can you also check the constraint that the minimum number of elements must be a power of two for scalable vectors? sdesmalen: Can you also check the constraint that the minimum number of elements must be a power of two…
		cameron.mcinallyAuthorUnsubmitted Done Reply Inline Actions I'm still getting up to speed with ElementCount, so I'm not sure that his is the best way to use it. Any experts? cameron.mcinally: I'm still getting up to speed with ElementCount, so I'm not sure that his is the best way to…
		"experimental_vector_extract_elements expects a constant index "
		"and stride, where stride >= index.");
		sdesmalenUnsubmitted Done Reply Inline Actions Can these two Assert's be merged into one? sdesmalen: Can these two Assert's be merged into one?

		Assert(ResEC == VecEC.divideCoefficientBy(CStride->getZExtValue()) &&
		VecEC.isKnownMultipleOf(CStride->getZExtValue()),
		"experimental_vector_extract_elements input vector type must be a "
		"multiple of the result type.",
		&Call);

		Assert(!VecEC.isScalable() \|\| isPowerOf2_64(VecEC.getKnownMinValue()),
		"experimental_vector_extract_elements expects the known minimum "
		"number of elements to be a power of two for scalable vector types.",
		&Call);
		break;
		}
};		};
}		}

/// Carefully grab the subprogram from a local scope.		/// Carefully grab the subprogram from a local scope.
///		///
/// This carefully grabs the subprogram from a local scope, avoiding the		/// This carefully grabs the subprogram from a local scope, avoiding the
/// built-in assertions that would typically fire.		/// built-in assertions that would typically fire.
static DISubprogram getSubprogram(Metadata LocalScope) {		static DISubprogram getSubprogram(Metadata LocalScope) {
▲ Show 20 Lines • Show All 805 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 926 Lines • ▼ Show 20 Lines	private:
SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_LOAD_SUB(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_LOAD_SUB(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_LOAD_AND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_LOAD_AND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerWindowsDYNAMIC_STACKALLOC(SDValue Op, SDValue Chain,		SDValue LowerWindowsDYNAMIC_STACKALLOC(SDValue Op, SDValue Chain,
SDValue &Size,		SDValue &Size,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
		SDValue LowerVectorExtractElements(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSVEStructLoad(unsigned Intrinsic, ArrayRef<SDValue> LoadOps,		SDValue LowerSVEStructLoad(unsigned Intrinsic, ArrayRef<SDValue> LoadOps,
EVT VT, SelectionDAG &DAG, const SDLoc &DL) const;		EVT VT, SelectionDAG &DAG, const SDLoc &DL) const;

SDValue LowerFixedLengthVectorIntDivideToSVE(SDValue Op,		SDValue LowerFixedLengthVectorIntDivideToSVE(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorIntExtendToSVE(SDValue Op,		SDValue LowerFixedLengthVectorIntExtendToSVE(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorLoadToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthVectorLoadToSVE(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,099 Lines • ▼ Show 20 Lines	for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {
setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);
setOperationAction(ISD::VECREDUCE_AND, VT, Custom);		setOperationAction(ISD::VECREDUCE_AND, VT, Custom);
setOperationAction(ISD::VECREDUCE_OR, VT, Custom);		setOperationAction(ISD::VECREDUCE_OR, VT, Custom);
setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);		setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);
setOperationAction(ISD::VECREDUCE_UMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_UMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);
		setOperationAction(ISD::VECTOR_EXTRACT_ELEMENTS, VT, Custom);
}		}

// Illegal unpacked integer vector types.		// Illegal unpacked integer vector types.
for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {		for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
}		}

Show All 13 Lines	for (auto VT : {MVT::nxv16i1, MVT::nxv8i1, MVT::nxv4i1, MVT::nxv2i1}) {
setOperationAction(ISD::UINT_TO_FP, VT, Custom);		setOperationAction(ISD::UINT_TO_FP, VT, Custom);
}		}
}		}

for (auto VT : {MVT::nxv2f16, MVT::nxv4f16, MVT::nxv8f16, MVT::nxv2f32,		for (auto VT : {MVT::nxv2f16, MVT::nxv4f16, MVT::nxv8f16, MVT::nxv2f32,
MVT::nxv4f32, MVT::nxv2f64}) {		MVT::nxv4f32, MVT::nxv2f64}) {
setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);		setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
		setOperationAction(ISD::VECTOR_EXTRACT_ELEMENTS, VT, Custom);
setOperationAction(ISD::MGATHER, VT, Custom);		setOperationAction(ISD::MGATHER, VT, Custom);
setOperationAction(ISD::MSCATTER, VT, Custom);		setOperationAction(ISD::MSCATTER, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::FADD, VT, Custom);		setOperationAction(ISD::FADD, VT, Custom);
setOperationAction(ISD::FDIV, VT, Custom);		setOperationAction(ISD::FDIV, VT, Custom);
setOperationAction(ISD::FMA, VT, Custom);		setOperationAction(ISD::FMA, VT, Custom);
setOperationAction(ISD::FMAXNUM, VT, Custom);		setOperationAction(ISD::FMAXNUM, VT, Custom);
▲ Show 20 Lines • Show All 368 Lines • ▼ Show 20 Lines	// doesn't undo this optimization.
SDValue EncConst = TLO.DAG.getTargetConstant(Enc, DL, VT);		SDValue EncConst = TLO.DAG.getTargetConstant(Enc, DL, VT);
New = SDValue(		New = SDValue(
TLO.DAG.getMachineNode(NewOpc, DL, VT, Op.getOperand(0), EncConst), 0);		TLO.DAG.getMachineNode(NewOpc, DL, VT, Op.getOperand(0), EncConst), 0);
}		}

return TLO.CombineTo(Op, New);		return TLO.CombineTo(Op, New);
}		}

		/// getExtFactor - Determine the adjustment factor for the position when
		/// generating an "extract from vector registers" instruction.
		static unsigned getExtFactor(SDValue &V) {
		EVT EltType = V.getValueType().getVectorElementType();
		return EltType.getSizeInBits() / 8;
		}

bool AArch64TargetLowering::targetShrinkDemandedConstant(		bool AArch64TargetLowering::targetShrinkDemandedConstant(
SDValue Op, const APInt &DemandedBits, const APInt &DemandedElts,		SDValue Op, const APInt &DemandedBits, const APInt &DemandedElts,
TargetLoweringOpt &TLO) const {		TargetLoweringOpt &TLO) const {
// Delay this optimization to as late as possible.		// Delay this optimization to as late as possible.
if (!TLO.LegalOps)		if (!TLO.LegalOps)
return false;		return false;

if (!EnableOptimizeLogicalImm)		if (!EnableOptimizeLogicalImm)
▲ Show 20 Lines • Show All 2,879 Lines • ▼ Show 20 Lines	return LowerToPredicatedOp(Op, DAG, AArch64ISD::BITREVERSE_MERGE_PASSTHRU,
/OverrideNEON=/true);		/OverrideNEON=/true);
case ISD::BSWAP:		case ISD::BSWAP:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::BSWAP_MERGE_PASSTHRU);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::BSWAP_MERGE_PASSTHRU);
case ISD::CTLZ:		case ISD::CTLZ:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU,		return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU,
/OverrideNEON=/true);		/OverrideNEON=/true);
case ISD::CTTZ:		case ISD::CTTZ:
return LowerCTTZ(Op, DAG);		return LowerCTTZ(Op, DAG);
		case ISD::VECTOR_EXTRACT_ELEMENTS:
		return LowerVectorExtractElements(Op, DAG);
}		}
}		}

bool AArch64TargetLowering::mergeStoresAfterLegalization(EVT VT) const {		bool AArch64TargetLowering::mergeStoresAfterLegalization(EVT VT) const {
return !Subtarget->useSVEForFixedLengthVectors();		return !Subtarget->useSVEForFixedLengthVectors();
}		}

bool AArch64TargetLowering::useSVEForFixedLengthVectorVT(		bool AArch64TargetLowering::useSVEForFixedLengthVectorVT(
▲ Show 20 Lines • Show All 2,929 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerShiftLeftParts(SDValue Op,
SDValue LoForNormalShift = DAG.getNode(ISD::SHL, dl, VT, ShOpLo, ShAmt);		SDValue LoForNormalShift = DAG.getNode(ISD::SHL, dl, VT, ShOpLo, ShAmt);
SDValue Lo = DAG.getNode(AArch64ISD::CSEL, dl, VT, LoForBigShift,		SDValue Lo = DAG.getNode(AArch64ISD::CSEL, dl, VT, LoForBigShift,
LoForNormalShift, CCVal, Cmp);		LoForNormalShift, CCVal, Cmp);

SDValue Ops[2] = { Lo, Hi };		SDValue Ops[2] = { Lo, Hi };
return DAG.getMergeValues(Ops, dl);		return DAG.getMergeValues(Ops, dl);
}		}

		SDValue
		AArch64TargetLowering::LowerVectorExtractElements(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc DL(Op);
		EVT VT = Op.getValueType();
		assert(VT.isScalableVector() &&
		"Unexpected fixed length vector in LowerVectorExtractElements!");

		SDValue Src = Op.getOperand(0);
		uint64_t Idx = Op.getConstantOperandVal(1);
		uint64_t Stride = Op.getConstantOperandVal(2);

		SDValue Undef = DAG.getUNDEF(VT);
		SDValue Res;
		switch (Stride) {
		default:
		report_fatal_error("Unhandled Stride in LowerVectorExtractElements!");
		case 0: {
		SDValue Elt = DAG.getNode(ISD::EXTRACT_ELEMENT, DL, VT.getScalarType(), Src,
		DAG.getConstant(Idx, DL, MVT::i64));
		Res = DAG.getNode(ISD::SPLAT_VECTOR, DL, VT, Elt);
		break;
		}
		case 1: {
		Res = DAG.getNode(AArch64ISD::EXT, DL, VT, Src, Undef,
		DAG.getConstant(Idx * getExtFactor(Src), DL, MVT::i64));
		break;
		}
		case 2: {
		assert(Idx <= 1 &&
		"LowerVectorExtractElements currently only handles a 0\|1 index!");
		// FIXME
		break;
		}
		case 4: {
		assert(Idx <= 1 &&
		"LowerVectorExtractElements currently only handles a 0\|1 index!");
		// FIXME
		break;
		}
		}

		return Res;
		}

bool AArch64TargetLowering::isOffsetFoldingLegal(		bool AArch64TargetLowering::isOffsetFoldingLegal(
const GlobalAddressSDNode *GA) const {		const GlobalAddressSDNode *GA) const {
// Offsets are folded in the DAG combine rather than here so that we can		// Offsets are folded in the DAG combine rather than here so that we can
// intelligently choose an offset based on the uses.		// intelligently choose an offset based on the uses.
return false;		return false;
}		}

bool AArch64TargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,		bool AArch64TargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,
▲ Show 20 Lines • Show All 523 Lines • ▼ Show 20 Lines	static SDValue WidenVector(SDValue V64Reg, SelectionDAG &DAG) {
MVT EltTy = VT.getVectorElementType().getSimpleVT();		MVT EltTy = VT.getVectorElementType().getSimpleVT();
MVT WideTy = MVT::getVectorVT(EltTy, 2 * NarrowSize);		MVT WideTy = MVT::getVectorVT(EltTy, 2 * NarrowSize);
SDLoc DL(V64Reg);		SDLoc DL(V64Reg);

return DAG.getNode(ISD::INSERT_SUBVECTOR, DL, WideTy, DAG.getUNDEF(WideTy),		return DAG.getNode(ISD::INSERT_SUBVECTOR, DL, WideTy, DAG.getUNDEF(WideTy),
V64Reg, DAG.getConstant(0, DL, MVT::i32));		V64Reg, DAG.getConstant(0, DL, MVT::i32));
}		}

/// getExtFactor - Determine the adjustment factor for the position when
/// generating an "extract from vector registers" instruction.
static unsigned getExtFactor(SDValue &V) {
EVT EltType = V.getValueType().getVectorElementType();
return EltType.getSizeInBits() / 8;
}

/// NarrowVector - Given a value in the V128 register class, produce the		/// NarrowVector - Given a value in the V128 register class, produce the
/// equivalent value in the V64 register class.		/// equivalent value in the V64 register class.
static SDValue NarrowVector(SDValue V128Reg, SelectionDAG &DAG) {		static SDValue NarrowVector(SDValue V128Reg, SelectionDAG &DAG) {
EVT VT = V128Reg.getValueType();		EVT VT = V128Reg.getValueType();
unsigned WideSize = VT.getVectorNumElements();		unsigned WideSize = VT.getVectorNumElements();
MVT EltTy = VT.getVectorElementType().getSimpleVT();		MVT EltTy = VT.getVectorElementType().getSimpleVT();
MVT NarrowTy = MVT::getVectorVT(EltTy, WideSize / 2);		MVT NarrowTy = MVT::getVectorVT(EltTy, WideSize / 2);
SDLoc DL(V128Reg);		SDLoc DL(V128Reg);
▲ Show 20 Lines • Show All 9,221 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-vector-extract-elements.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -asm-verbose=0 < %s 2>%t \| FileCheck %s --check-prefixes=CHECK
				; RUN: FileCheck --check-prefix=WARN --allow-empty %s < %t

				; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
				; WARN-NOT: warning

				;; Integer types

				define <8 x i8> @extract_elements_v16i8(<16 x i8> %vec) nounwind {
				; CHECK-LABEL: extract_elements_v16i8:
				; CHECK-NEXT: ret
				%retval = call <8 x i8> @llvm.experimental.vector.extract.elements.v16i8(<16 x i8> %vec, i32 0, i32 2)
				ret <8 x i8> %retval
				}

				define <4 x i16> @extract_elements_v8i16(<8 x i16> %vec) nounwind {
				; CHECK-LABEL: extract_elements_v8i16:
				; CHECK-NEXT: ret
				%retval = call <4 x i16> @llvm.experimental.vector.extract.elements.v8i16(<8 x i16> %vec, i32 0, i32 2)
				ret <4 x i16> %retval
				}

				define <2 x i32> @extract_elements_v4i32(<4 x i32> %vec) nounwind {
				; CHECK-LABEL: extract_elements_v4i32:
				; CHECK-NEXT: ret
				%retval = call <2 x i32> @llvm.experimental.vector.extract.elements.v4i32(<4 x i32> %vec, i32 0, i32 2)
				ret <2 x i32> %retval
				}

				define <1 x i64> @extract_elements_v2i64(<2 x i64> %vec) nounwind {
				; CHECK-LABEL: extract_elements_v2i64:
				; CHECK-NEXT: ret
				%retval = call <1 x i64> @llvm.experimental.vector.extract.elements.v2i64(<2 x i64> %vec, i32 0, i32 2)
				ret <1 x i64> %retval
				}

				define <vscale x 4 x i32> @extract_elements_nxv8i32(<vscale x 8 x i32> %vec) nounwind {
				; CHECK-LABEL: extract_elements_nxv8i32:
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x i32> @llvm.experimental.vector.extract.elements.nxv8i32(<vscale x 8 x i32> %vec, i32 0, i32 2)
				ret <vscale x 4 x i32> %retval
				}

				define <vscale x 2 x i64> @extract_elements_nxv4i64(<vscale x 4 x i64> %vec) nounwind {
				; CHECK-LABEL: extract_elements_nxv4i64:
				; CHECK-NEXT: ret
				%retval = call <vscale x 2 x i64> @llvm.experimental.vector.extract.elements.nxv4i64(<vscale x 4 x i64> %vec, i32 0, i32 2)
				ret <vscale x 2 x i64> %retval
				}


				;; Floating Point types

				define <1 x float> @extract_elements_v2f32(<2 x float> %vec) nounwind {
				; CHECK-LABEL: extract_elements_v2f32:
				; CHECK-NEXT: ret
				%retval = call <1 x float> @llvm.experimental.vector.extract.elements.v2f32(<2 x float> %vec, i32 0, i32 2)
				ret <1 x float> %retval
				}

				define <2 x float> @extract_elements_v4f32(<4 x float> %vec) nounwind {
				; CHECK-LABEL: extract_elements_v4f32:
				; CHECK-NEXT: ret
				%retval = call <2 x float> @llvm.experimental.vector.extract.elements.v4f32(<4 x float> %vec, i32 0, i32 2)
				ret <2 x float> %retval
				}

				define <1 x double> @extract_elements_v2f64(<2 x double> %vec) nounwind {
				; CHECK-LABEL: extract_elements_v2f64:
				; CHECK-NEXT: ret
				%retval = call <1 x double> @llvm.experimental.vector.extract.elements.v2f64(<2 x double> %vec, i32 0, i32 2)
				ret <1 x double> %retval
				}

				define <vscale x 8 x half> @extract_elements_nxv16f16(<vscale x 16 x half> %vec) nounwind {
				; CHECK-LABEL: extract_elements_nxv16f16:
				; CHECK-NEXT: ret
				%retval = call <vscale x 8 x half> @llvm.experimental.vector.extract.elements.nxv16f16(<vscale x 16 x half> %vec, i32 0, i32 2)
				ret <vscale x 8 x half> %retval
				}

				define <vscale x 4 x float> @extract_elements_nxv8f32(<vscale x 8 x float> %vec) nounwind {
				; CHECK-LABEL: extract_elements_nxv8f32:
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x float> @llvm.experimental.vector.extract.elements.nxv8f32(<vscale x 8 x float> %vec, i32 0, i32 2)
				ret <vscale x 4 x float> %retval
				}

				define <vscale x 2 x double> @extract_elements_nxv4f64(<vscale x 4 x double> %vec) nounwind {
				; CHECK-LABEL: extract_elements_nxv4f64:
				; CHECK-NEXT: ret
				%retval = call <vscale x 2 x double> @llvm.experimental.vector.extract.elements.nxv4f64(<vscale x 4 x double> %vec, i32 0, i32 2)
				ret <vscale x 2 x double> %retval
				}


				; Integer declarations
				declare <8 x i8> @llvm.experimental.vector.extract.elements.v16i8(<16 x i8>, i32, i32)
				declare <4 x i16> @llvm.experimental.vector.extract.elements.v8i16(<8 x i16>, i32, i32)
				declare <2 x i32> @llvm.experimental.vector.extract.elements.v4i32(<4 x i32>, i32, i32)
				declare <1 x i64> @llvm.experimental.vector.extract.elements.v2i64(<2 x i64>, i32, i32)
				declare <vscale x 16 x i8> @llvm.experimental.vector.extract.elements.nxv32i8(<vscale x 32 x i8>, i32, i32)
				declare <vscale x 8 x i16> @llvm.experimental.vector.extract.elements.nxv16i16(<vscale x 16 x i16>, i32, i32)
				declare <vscale x 4 x i32> @llvm.experimental.vector.extract.elements.nxv8i32(<vscale x 8 x i32>, i32, i32)
				declare <vscale x 2 x i64> @llvm.experimental.vector.extract.elements.nxv4i64(<vscale x 4 x i64>, i32, i32)


				; Floating point declarations
				declare <4 x half> @llvm.experimental.vector.extract.elements.v8f16(<8 x half>, i32, i32)
				declare <2 x float> @llvm.experimental.vector.extract.elements.v4f32(<4 x float>, i32, i32)
				declare <1 x double> @llvm.experimental.vector.extract.elements.v2f64(<2 x double>, i32, i32)
				declare <vscale x 8 x half> @llvm.experimental.vector.extract.elements.nxv16f16(<vscale x 16 x half>, i32, i32)
				declare <vscale x 4 x float> @llvm.experimental.vector.extract.elements.nxv8f32(<vscale x 8 x float>, i32, i32)
				declare <vscale x 2 x double> @llvm.experimental.vector.extract.elements.nxv4f64(<vscale x 4 x double>, i32, i32)

This is an archive of the discontinued LLVM Phabricator instance.

[RFC][Scalable] Add scalable shuffle intrinsic to extract evens from a pair of vectorsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 320210

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

llvm/lib/IR/Verifier.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-vector-extract-elements.ll

[RFC][Scalable] Add scalable shuffle intrinsic to extract evens from a pair of vectors
Needs ReviewPublic