This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
10/13
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
3/3
ISDOpcodes.h
-
IR/
1/1
Intrinsics.td
-
Target/
-
TargetSelectionDAG.td
-
lib/
-
CodeGen/
-
SelectionDAG/
-
LegalizeDAG.cpp
-
LegalizeIntegerTypes.cpp
-
LegalizeTypes.h
4/4
LegalizeVectorTypes.cpp
-
SelectionDAGBuilder.h
2/3
SelectionDAGBuilder.cpp
-
SelectionDAGDumper.cpp
-
TargetLoweringBase.cpp
-
Target/AArch64/
-
AArch64/
1/2
AArch64ISelLowering.h
2/5
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/1
named-vector-shuffles-neon.ll
-
named-vector-shuffles-sve.ll

Differential D94708

[IR] Introduce llvm.experimental.vector.splice intrinsic
ClosedPublic

Authored by c-rhodes on Jan 14 2021, 12:18 PM.

Download Raw Diff

Details

Reviewers

sdesmalen
david-arm
fhahn
craig.topper
cameron.mcinally
CarolineConcatto

Commits

rG2750f3ed3155: [IR] Introduce llvm.experimental.vector.splice intrinsic

Summary

This patch introduces a new intrinsic @llvm.experimental.vector.splice
that constructs a vector of the same type as the two input vectors,
based on a immediate where the sign of the immediate distinguishes two
variants. A positive immediate specifies an index into the first vector
and a negative immediate specifies the number of trailing elements to
extract from the first vector.

For example:

@llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E>  ; index
@llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count

These intrinsics support both fixed and scalable vectors, where the
former is lowered to a shufflevector to maintain existing behaviour,
although while marked as experimental the recommended way to express
this operation for fixed-width vectors is to use shufflevector. For
scalable vectors where it is not possible to express a shufflevector
mask for this operation, a new ISD node has been implemented.

This is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker and Cullen Rhodes.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Diff Detail

Event Timeline

c-rhodes created this revision.Jan 14 2021, 12:18 PM

Herald added subscribers: jdoerfert, hiraditya, kristof.beyls. · View Herald TranscriptJan 14 2021, 12:18 PM

c-rhodes requested review of this revision.Jan 14 2021, 12:18 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2021, 12:18 PM

cameron.mcinally added inline comments.Jan 14 2021, 12:37 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1093	Is this change required for the splice patch? I wonder if it should be broken out to its own commit.

Harbormaster completed remote builds in B85213: Diff 316728.Jan 14 2021, 1:09 PM

c-rhodes added inline comments.Jan 15 2021, 4:19 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1093	Is this change required for the splice patch? I wonder if it should be broken out to its own commit. This fixes a selection error I hit when splicing predicates. The promotion of predicates uses a truncate which gets lowered to a setcc that crashes on operands of type `VT`, e.g.: t17: nxv2i64 = and t29, t30 t31: nxv2i64 = AArch64ISD::DUP Constant:i64<0> t19: nxv2i1 = setcc t17, t31, setne:ch I can split this into a separate patch.

In D94444, @paulwalker-arm proposed a more generic extract vector intrinsic that accepts an index and stride. Now I'm wondering if we should just have a generic scalable shuffle vector intrinsic to handle all these operations under one intrinsic.

That idea doesn't need to hold up this Diff, but it might be something to consider...

craig.topper added inline comments.Jan 15 2021, 4:33 PM

llvm/docs/LangRef.rst
16185	Does this mention that %trailing.elt must be a constant?
llvm/include/llvm/IR/Intrinsics.td
1638	Should this have DefaultAttrs?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1258	Can you use setOperationPromotedToType here?

In D94708#2501255, @cameron.mcinally wrote:

In D94444, @paulwalker-arm proposed a more generic extract vector intrinsic that accepts an index and stride. Now I'm wondering if we should just have a generic scalable shuffle vector intrinsic to handle all these operations under one intrinsic.

That idea doesn't need to hold up this Diff, but it might be something to consider...

I don't believe that can work for the same reasons having a single LLVM instruction does not work. Each shuffle has different requirements. Some require one operand whilst other require two and some require additional scalar operands when others do not.

paulwalker-arm added inline comments.Jan 16 2021, 4:18 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.h
903	It looks like you're missing the implementation of this function and some associated isel patterns, which explains the poor code generation for the SVE _1 test variants. Or are you planning to add those under a second patch, in which case this function declaration should be removed from this patch.

c-rhodes added inline comments.Jan 16 2021, 4:27 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.h
903	It looks like you're missing the implementation of this function and some associated isel patterns, which explains the poor code generation for the SVE _1 test variants. Or are you planning to add those under a second patch, in which case this function declaration should be removed from this patch. The plan is to upstream those patterns separately, they were initially part of this patch but they depended on some changes we have downstream to use the SIMD variant of INSR when the scalar argument comes from a vector extract. It looks like I missed this declaration, I'll remove it.

Matt added a subscriber: Matt.Jan 19 2021, 9:14 AM

Changes:

Remove unused function declaration for LowerVECTOR_SPLICE.
Clarify trailing.elts must be an integer constant in LangRef.
Use DefaultAttrs for intrinsic.
Use setOperationPromotedToType.

c-rhodes marked 3 inline comments as done.Jan 20 2021, 6:39 AM

c-rhodes added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1093	Is this change required for the splice patch? I wonder if it should be broken out to its own commit. This fixes a selection error I hit when splicing predicates. The promotion of predicates uses a truncate which gets lowered to a setcc that crashes on operands of type `VT`, e.g.: t17: nxv2i64 = and t29, t30 t31: nxv2i64 = AArch64ISD::DUP Constant:i64<0> t19: nxv2i1 = setcc t17, t31, setne:ch I can split this into a separate patch. I tried splitting this out but I couldn't figure out how to test it so I've left it in this patch. What I found was the setcc being generated by the splice occurs after result type legalization, so without the above a setcc returning an `nxv2i1` and taking two `nxv2i64` vectors is considered legal at this point in selection. With the above it falls into `SelectionDAGLegalize::LegalizeOp` which considers the legality based on the operand VTs, so setcc then gets custom lowered to the target-specific predicated variant `AArch64ISD::SETCC_MERGE_ZERO` that can be selected.
1258	Can you use setOperationPromotedToType here? That's nice, wasn't aware of that thanks.

Are there plans to use this for fixed vectors as well? If not, can this be restricted to scalable vectors only? It seems like for fixed vectors, preferring the shufflevector versions would be beneficial, to avoid having to update all transforms lookin at shuffles.

Otherwise, this should probably have some tests for platforms other than AArch64 and also support in GlobalISel, which is the default on AArch64 with -O0 IIRC (or do the transform to shuffles as an IR transform).

sdesmalen added inline comments.Jan 20 2021, 7:15 AM

llvm/docs/LangRef.rst
16215	I expect that we want to support two flavours of the splice intrinsic: // in flavour1 an immediate of '3' translates to the number of trailing elements, // e.g. start index of VL - 3 == 'B'. experimental.vector.splice.flavour1(<A,B,C,D>, <E,F,G,H>, 3) ==> <B, C, D, E> // in flavour2 an immediate of '1' translates to the start index 1 == 'B'. experimental.vector.splice.flavour2(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> This patch implements only flavour1, and because trailing.elts should be an immediate it is not possible to express flavour2. Do we want to add flavour2 as well at some point? And if so, what would it be named? Alternatively, it seems possible to express both with the current splice intrinsic where the sign of the immediate distinguishes the two flavours, e.g. a start index of `-3` would 'wrap' by VL to index 1, and thus have the same meaning as `3` in flavour1, whereas a positive immediate of `1` would mean index 1, as in flavour 2).

In D94708#2509610, @fhahn wrote:

Are there plans to use this for fixed vectors as well? If not, can this be restricted to scalable vectors only? It seems like for fixed vectors, preferring the shufflevector versions would be beneficial, to avoid having to update all transforms lookin at shuffles.

Otherwise, this should probably have some tests for platforms other than AArch64 and also support in GlobalISel, which is the default on AArch64 with -O0 IIRC (or do the transform to shuffles as an IR transform).

The named shufflevector intrinsics accept both fixed and scalable vectors but for fixed they map to existing shufflevector. That's a good point thanks for raising that, converting to a shufflevector as an IR transform like the insert/extract do is probably the right thing to do here.

paulwalker-arm added inline comments.Jan 20 2021, 12:50 PM

llvm/docs/LangRef.rst
16215	Overloading the usage based on the sign of the immediate sounds like a worth upgrade to me.
16217–16218	This restriction does not make sense to me. The logical requirement is for the trailing elements to be less than or equal to the full runtime-time vector length. Then I'd say that counts beyond this range result in poison.

In D94708#2509677, @c-rhodes wrote:

In D94708#2509610, @fhahn wrote:

Are there plans to use this for fixed vectors as well? If not, can this be restricted to scalable vectors only? It seems like for fixed vectors, preferring the shufflevector versions would be beneficial, to avoid having to update all transforms lookin at shuffles.

Otherwise, this should probably have some tests for platforms other than AArch64 and also support in GlobalISel, which is the default on AArch64 with -O0 IIRC (or do the transform to shuffles as an IR transform).

The named shufflevector intrinsics accept both fixed and scalable vectors but for fixed they map to existing shufflevector.

As specified here yes, but I was wondering if they actually need to support fixed-width vectors? Is there a reason to use the intrinsics with fixed vectors instead of shuffles?

fhahn mentioned this in D94883: [CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse.Jan 21 2021, 8:40 AM

CarolineConcatto added a reviewer: CarolineConcatto.Jan 21 2021, 8:51 AM

timsmith78 added a subscriber: timsmith78.Jan 21 2021, 9:23 AM

FWIW, we have a similar intrinsic in our downstream compiler for RISC-V. We call it experimental.vector.slideleftfill and is same as flavour 2 of this (except the "offset" need not be an immediate) as suggested by @sdesmalen. We have a separate predicated version (experimental.vector.vp.slideleftfill) too which takes additional arguments for the explicit vector length of the two vectors.

llvm/docs/LangRef.rst
16216	Why is it required for the `trainling.elts` to be an immediate? I imagine there could be a scenario where this value is unknown at compile-time, for instance a recurrence where the order of recurrence is determined at runtime.

paulwalker-arm added inline comments.Jan 27 2021, 4:04 AM

llvm/docs/LangRef.rst
16216	At this stage we don't want to muddy the waters by introducing new shuffle behaviours to LLVM, but rather just extend the existing shuffle requirements to cover scalable vectors. Today `shufflevector` only supports a literal mask and thus at this stage we are enforcing this same requirement to the intrinsic variants.

Changes:

As proposed by @sdesmalen, the intrinsic now supports two variants based on the sign of the immediate, where a negative immediate is a trailing element count and a positive immediate an index.
Following the discussion on the reverse intrinsic (D94883) around whether named shufflevector intrinsics should support fixed vectors, I've opted for the same approach to make it explicit in the LangRef that whilst these instructions are experimental shufflevector should be used for fixed-width vectors. The changes to InstCombineCalls to map this intrinsic to shufflevector as an IR transform have been dropped.
ExpandVectorSpliceThroughStack has been moved to TLI under expandVectorSplice which is reused by DAGTypeLegalizer::SplitVecRes_VECTOR_SPLICE.

In D94708#2512706, @fhahn wrote:

In D94708#2509677, @c-rhodes wrote:

In D94708#2509610, @fhahn wrote:

Are there plans to use this for fixed vectors as well? If not, can this be restricted to scalable vectors only? It seems like for fixed vectors, preferring the shufflevector versions would be beneficial, to avoid having to update all transforms lookin at shuffles.

Otherwise, this should probably have some tests for platforms other than AArch64 and also support in GlobalISel, which is the default on AArch64 with -O0 IIRC (or do the transform to shuffles as an IR transform).

The named shufflevector intrinsics accept both fixed and scalable vectors but for fixed they map to existing shufflevector.

As specified here yes, but I was wondering if they actually need to support fixed-width vectors? Is there a reason to use the intrinsics with fixed vectors instead of shuffles?

I noticed there was a similar discussion on D94883, as mentioned I've took the same approach and made it explicit in the LangRef that shufflevector should be used instead for this intrinsic whilst it's experimental.

llvm/docs/LangRef.rst
16217–16218	This restriction does not make sense to me. The logical requirement is for the trailing elements to be less than or equal to the full runtime-time vector length. Then I'd say that counts beyond this range result in poison. My mistake, you're right it should be the full runtime time VL rather than minimum number of elements. LangRef has been updated.

Just a few nits, will have a closer look at the patch later.

llvm/docs/LangRef.rst
16200–16206	nit: The signed immediate, modulo the number of elements in the vector, is the index into the first vector from which to extract the result value. This means conceptually that for a positive immediate, a vector is extracted from concat(%vec1, %vec2) starting at index `imm`, whereas for a negative immediate, it extracts `imm` trailing elements from the first vector, and the remaining elements from %vec2.
16209	nit: insert comma.
16227	It's better to say that if the immediate value is outside this range, the result is a poison value. That leaves it up to the implementation how to handle it.
16227	where VL is the runtime vector length of the source/result vector.

sdesmalen added inline comments.Feb 19 2021, 6:44 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5511–5518	is there a way to split the load using existing code?
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
10854	just `Imm % NumElts`; ?
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8657 ↗	(On Diff #324619)	You'll need to do a similar clamping for Imm > 0 (or actually, Imm > MinKnownVL) , to avoid loading beyond the allocated stack object.
llvm/test/CodeGen/AArch64/named-vector-shuffles-neon.ll
20	nit: not sure if there is a lot of value to do this for each possible element-type, if you do two (i8 and some other type) that may be sufficient.

Address comments

c-rhodes marked 6 inline comments as done.Feb 19 2021, 7:51 AM

c-rhodes added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5511–5518	is there a way to split the load using existing code? Splitting the full load from `expandVectorSplice` with `SplitVecRes_LOAD` seems to work.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
10854	just `Imm % NumElts`; ? I tried that at first but found the behaviour isn't correct for negative immediates, for example a trailing element count of `-15` and 16 elements `-15 % 16 = -15`, so end up with this mask: `[-15,-14,-13,-12,-11,-10,-9,-8,-7,-6,-5,-4,-3,-2,-1,0]`. From what I read the sign of the modulo result is implementation defined if one or both of the operands are are negative.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8657 ↗	(On Diff #324619)	You'll need to do a similar clamping for Imm > 0 (or actually, Imm > MinKnownVL) , to avoid loading beyond the allocated stack object. For the index `getVectorElementPointer` does the clamping via `clampDynamicVectorIndex`

paulwalker-arm added inline comments.Feb 19 2021, 11:08 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
937	This doesn't look great to me because it ties the hands of expandVectorSplice, whose name does not suggest it must return a load. If you prefer not to explicitly create the split loads then it seems more natural, and future proof, if the result of expandVectorSplice is split using a pair of EXTRACT_SUBVECTOR operations.

bin.cheng-ali added a subscriber: bin.cheng-ali.Feb 19 2021, 10:51 PM

bin.cheng-ali added inline comments.

llvm/docs/LangRef.rst
16207	One nit. Given `imm` is negative here, should this be? "it extracts `-imm` trailing elements from the first vector, ..."
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
10854	Then `(NumElts + Imm) % NumElts` ?
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8623 ↗	(On Diff #324983)	Sorry for may stupid question, this stores two vector register and load in the middle, does Endingness matters here? How does LLVM make sure correct sequence of elements is loaded? Thanks in advance.

paulwalker-arm added inline comments.Feb 22 2021, 3:27 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8623 ↗	(On Diff #324983)	There's no such thing as a stupid question :) These are load/store operations on vector types so endianness only plays a role in how each element is stored and not in the order of the elements. That's to say, element N will be at the same location in memory regardless of endianness, however, the bytes that make up element N will be laid out differently. The index used to splice the vectors is also element based, which means there are no partial element accesses and thus no endianness issues.

Address comments

c-rhodes marked 3 inline comments as done.Feb 22 2021, 7:11 AM

c-rhodes added inline comments.Feb 22 2021, 7:17 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
937	This doesn't look great to me because it ties the hands of expandVectorSplice, whose name does not suggest it must return a load. If you prefer not to explicitly create the split loads then it seems more natural, and future proof, if the result of expandVectorSplice is split using a pair of EXTRACT_SUBVECTOR operations. Fair point, I've used EXTRACT_SUBVECTOR instead. I should point out warnings were being generated for the `nxv16f32` splitvec tests by the `getVectorNumElements` call in `DAGTypeLegalizer::SplitVecRes_EXTRACT_SUBVECTOR`. To remove the warnings I changed it to `getVectorMinNumElements`.

bin.cheng-ali added inline comments.Feb 23 2021, 1:59 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8623 ↗	(On Diff #324983)	Thanks very much for explanation. I assume this is defined in Arm architecture reference manual, however, I noticed there is supplement rule for SVE as: For unpredicated SVE vector register loads and stores, the vector is treated as containing byte elements that are transferred in increasing element number order without any endianness conversion. IIUC, this rule would apply here? Of course no endianness issue either way.

HLJ2009 added a subscriber: HLJ2009.Feb 23 2021, 2:04 AM

Just a few more nits.

llvm/docs/LangRef.rst
16229	nit: `s/, for/. For/`
llvm/include/llvm/CodeGen/ISDOpcodes.h
560	nit: `s/T is [..] and//`
560	nit: `s/out-of-bounds/not within the range [-VL, VL)/`
561–562	Please remove this restriction for scalable vectors. The implementation (by clamping to avoid a runtime crash) should not dictate the specification. It is sufficient to say that if IMM is out of range that the result value is undefined.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8644 ↗	(On Diff #325438)	nit: because the Imm < 0 block is a lot bigger, maybe you can rewrite it as: if (Imm >= 0) { // Load back the required element. StackPtr = getVectorElementPointer(DAG, StackPtr, VT, Node->getOperand(2)); // Load the spliced result return DAG.getLoad(VT, DL, StoreV2, StackPtr, MachinePointerInfo::getUnknownStack(MF)); } // Handle Imm < 0 case here.
8657 ↗	(On Diff #324619)	Thanks for confirming. Can you just add a comment that clarifies this?

Address @sdesmalen's comments

c-rhodes marked 6 inline comments as done.Mar 3 2021, 3:52 AM

The patch looks fine to me, cheers @c-rhodes.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8643 ↗	(On Diff #327738)	nit: s/This/getVectorElementPointer/

This revision is now accepted and ready to land.Mar 3 2021, 4:13 AM

Harbormaster completed remote builds in B91782: Diff 327738.Mar 3 2021, 7:39 AM

Thanks for reviewing all, I'll land this early next week unless there's any objections between then, cheers

This revision was landed with ongoing or failed builds.Mar 9 2021, 2:45 AM

Closed by commit rG2750f3ed3155: [IR] Introduce llvm.experimental.vector.splice intrinsic (authored by c-rhodes). · Explain Why

This revision was automatically updated to reflect the committed changes.

c-rhodes added a commit: rG2750f3ed3155: [IR] Introduce llvm.experimental.vector.splice intrinsic.

vkmr mentioned this in D103898: [VP] Vector predicated vector splice intrinsic.Jun 8 2021, 7:46 AM

simoll mentioned this in rG72a08c0b9404: [VP] Vector predicated vector splice intrinsic.Sep 29 2021, 1:44 AM

Jimerlife mentioned this in D128717: [RISCV] Change VECTOR_SPLICE mask operation from expand to promote.Jun 28 2022, 3:13 AM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

36 lines

include/

llvm/

CodeGen/

ISDOpcodes.h

5 lines

IR/

Intrinsics.td

7 lines

Target/

TargetSelectionDAG.td

4 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

71 lines

LegalizeIntegerTypes.cpp

11 lines

LegalizeTypes.h

2 lines

LegalizeVectorTypes.cpp

64 lines

SelectionDAGBuilder.h

1 line

SelectionDAGBuilder.cpp

31 lines

SelectionDAGDumper.cpp

1 line

TargetLoweringBase.cpp

3 lines

Target/

AArch64/

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

10 lines

test/

CodeGen/

AArch64/

named-vector-shuffles-neon.ll

118 lines

named-vector-shuffles-sve.ll

631 lines

Diff 316728

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 16,176 Lines • ▼ Show 20 Lines
	vector length of the result type. If the result type is a scalable vector,			vector length of the result type. If the result type is a scalable vector,
	``idx`` is first scaled by the result type's runtime scaling factor. Elements			``idx`` is first scaled by the result type's runtime scaling factor. Elements
	``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector			``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector
	indices. If this condition cannot be determined statically but is false at			indices. If this condition cannot be determined statically but is false at
	runtime, then the result vector is undefined. The ``idx`` parameter must be a			runtime, then the result vector is undefined. The ``idx`` parameter must be a
	vector index constant type (for most targets this will be an integer pointer			vector index constant type (for most targets this will be an integer pointer
	type).			type).

				'``llvm.experimental.vector.splice``' Intrinsic
				craig.topperUnsubmitted Done Reply Inline Actions Does this mention that %trailing.elt must be a constant? craig.topper: Does this mention that %trailing.elt must be a constant?
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""
				This is an overloaded intrinsic.

				::

				declare <2 x double> @llvm.experimental.vector.splice.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %trailing.elts)
				declare <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %trailing.elts)

				Overview:
				"""""""""

				The '``llvm.experimental.vector.splice.*``' intrinsics construct a vector by
				concatenating the trailing elements from the first input vector with the
				starting elements of the second input vector, returning a vector of the same
				type as the input vectors.

				For example:

				sdesmalenUnsubmitted Done Reply Inline Actions nit: The signed immediate, modulo the number of elements in the vector, is the index into the first vector from which to extract the result value. This means conceptually that for a positive immediate, a vector is extracted from concat(%vec1, %vec2) starting at index `imm`, whereas for a negative immediate, it extracts `imm` trailing elements from the first vector, and the remaining elements from %vec2. sdesmalen: nit: The signed immediate, modulo the number of elements in the vector, is the index into the…
				.. code-block:: text
				bin.cheng-aliUnsubmitted Done Reply Inline Actions One nit. Given `imm` is negative here, should this be? "it extracts `-imm` trailing elements from the first vector, ..." bin.cheng-ali: One nit. Given ``imm`` is negative here, should this be? "it extracts ``-imm`` trailing…

				experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 3) ==> <B, C, D, E>
				sdesmalenUnsubmitted Done Reply Inline Actions nit: insert comma. sdesmalen: nit: insert comma.


				Arguments:
				""""""""""

				The first two operands are vectors with the same type. The third argument
				sdesmalenUnsubmitted Done Reply Inline Actions I expect that we want to support two flavours of the splice intrinsic: // in flavour1 an immediate of '3' translates to the number of trailing elements, // e.g. start index of VL - 3 == 'B'. experimental.vector.splice.flavour1(<A,B,C,D>, <E,F,G,H>, 3) ==> <B, C, D, E> // in flavour2 an immediate of '1' translates to the start index 1 == 'B'. experimental.vector.splice.flavour2(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> This patch implements only flavour1, and because trailing.elts should be an immediate it is not possible to express flavour2. Do we want to add flavour2 as well at some point? And if so, what would it be named? Alternatively, it seems possible to express both with the current splice intrinsic where the sign of the immediate distinguishes the two flavours, e.g. a start index of `-3` would 'wrap' by VL to index 1, and thus have the same meaning as `3` in flavour1, whereas a positive immediate of `1` would mean index 1, as in flavour 2). sdesmalen: I expect that we want to support two flavours of the splice intrinsic: // in flavour1 an…
				paulwalker-armUnsubmitted Done Reply Inline Actions Overloading the usage based on the sign of the immediate sounds like a worth upgrade to me. paulwalker-arm: Overloading the usage based on the sign of the immediate sounds like a worth upgrade to me.
				``trailing.elts`` specifies the number of trailing elements to extract from the
				vkmrUnsubmitted Not Done Reply Inline Actions Why is it required for the `trainling.elts` to be an immediate? I imagine there could be a scenario where this value is unknown at compile-time, for instance a recurrence where the order of recurrence is determined at runtime. vkmr: Why is it required for the `trainling.elts` to be an immediate? I imagine there could be a…
				paulwalker-armUnsubmitted Not Done Reply Inline Actions At this stage we don't want to muddy the waters by introducing new shuffle behaviours to LLVM, but rather just extend the existing shuffle requirements to cover scalable vectors. Today `shufflevector` only supports a literal mask and thus at this stage we are enforcing this same requirement to the intrinsic variants. paulwalker-arm: At this stage we don't want to muddy the waters by introducing new shuffle behaviours to LLVM…
				first vector, if this exceeds the known minimum number of elements in the first
				vector, it is clamped.
				paulwalker-armUnsubmitted Not Done Reply Inline Actions This restriction does not make sense to me. The logical requirement is for the trailing elements to be less than or equal to the full runtime-time vector length. Then I'd say that counts beyond this range result in poison. paulwalker-arm: This restriction does not make sense to me. The logical requirement is for the trailing…
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions This restriction does not make sense to me. The logical requirement is for the trailing elements to be less than or equal to the full runtime-time vector length. Then I'd say that counts beyond this range result in poison. My mistake, you're right it should be the full runtime time VL rather than minimum number of elements. LangRef has been updated. c-rhodes: > This restriction does not make sense to me. The logical requirement is for the trailing…


	Matrix Intrinsics			Matrix Intrinsics
	-----------------			-----------------

	Operations on matrixes requiring shape information (like number of rows/columns			Operations on matrixes requiring shape information (like number of rows/columns
	or the memory layout) can be expressed using the matrix intrinsics. These			or the memory layout) can be expressed using the matrix intrinsics. These
	intrinsics require matrix dimensions to be passed as immediate arguments, and			intrinsics require matrix dimensions to be passed as immediate arguments, and
	matrixes are passed and returned as vectors. This means that for a ``R`` x			matrixes are passed and returned as vectors. This means that for a ``R`` x
				sdesmalenUnsubmitted Done Reply Inline Actions It's better to say that if the immediate value is outside this range, the result is a poison value. That leaves it up to the implementation how to handle it. sdesmalen: It's better to say that if the immediate value is outside this range, the result is a poison…
				sdesmalenUnsubmitted Done Reply Inline Actions where VL is the runtime vector length of the source/result vector. sdesmalen: where VL is the runtime vector length of the source/result vector.
	``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the			``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the
	corresponding vector, with indices starting at 0. Currently column-major layout			corresponding vector, with indices starting at 0. Currently column-major layout
				sdesmalenUnsubmitted Done Reply Inline Actions nit: `s/, for/. For/` sdesmalen: nit: `s/, for/. For/`
	is assumed. The intrinsics support both integer and floating point matrixes.			is assumed. The intrinsics support both integer and floating point matrixes.


	'``llvm.matrix.transpose.*``' Intrinsic			'``llvm.matrix.transpose.*``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:			Syntax:
	"""""""			"""""""
	▲ Show 20 Lines • Show All 5,044 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 543 Lines • ▼ Show 20 Lines	enum NodeType {
/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int		/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int
/// values that indicate which value (or undef) each result element will		/// values that indicate which value (or undef) each result element will
/// get. These constant ints are accessible through the		/// get. These constant ints are accessible through the
/// ShuffleVectorSDNode class. This is quite similar to the Altivec		/// ShuffleVectorSDNode class. This is quite similar to the Altivec
/// 'vperm' instruction, except that the indices must be constants and are		/// 'vperm' instruction, except that the indices must be constants and are
/// in terms of the element size of VEC1/VEC2, not in terms of bytes.		/// in terms of the element size of VEC1/VEC2, not in terms of bytes.
VECTOR_SHUFFLE,		VECTOR_SHUFFLE,

		// VECTOR_SPLICE(VEC1, VEC2, IMM) - Returns a vector, of the same type as
		// VEC1/VEC2, whose elements are shuffled using the following algorithm:
		// RESULT[i] = CONCAT_VECTORS(VEC1,VEC2)[VEC1.ElementCount - IMM + i]
		VECTOR_SPLICE,

/// SCALAR_TO_VECTOR(VAL) - This represents the operation of loading a		/// SCALAR_TO_VECTOR(VAL) - This represents the operation of loading a
/// scalar value into element 0 of the resultant vector type. The top		/// scalar value into element 0 of the resultant vector type. The top
/// elements 1 to N-1 of the N-element vector are undefined. The type		/// elements 1 to N-1 of the N-element vector are undefined. The type
/// of the operand must match the vector element type, except when they		/// of the operand must match the vector element type, except when they
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `s/T is [..] and//` sdesmalen: nit: `s/T is [..] and//`
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `s/out-of-bounds/not within the range [-VL, VL)/` sdesmalen: nit: `s/out-of-bounds/not within the range [-VL, VL)/`
/// are integer types. In this case the operand is allowed to be wider		/// are integer types. In this case the operand is allowed to be wider
/// than the vector element type, and is implicitly truncated to it.		/// than the vector element type, and is implicitly truncated to it.
		sdesmalenUnsubmitted Done Reply Inline Actions Please remove this restriction for scalable vectors. The implementation (by clamping to avoid a runtime crash) should not dictate the specification. It is sufficient to say that if IMM is out of range that the result value is undefined. sdesmalen: Please remove this restriction for scalable vectors. The implementation (by clamping to avoid a…
SCALAR_TO_VECTOR,		SCALAR_TO_VECTOR,

/// SPLAT_VECTOR(VAL) - Returns a vector with the scalar value VAL		/// SPLAT_VECTOR(VAL) - Returns a vector with the scalar value VAL
/// duplicated in all lanes. The type of the operand must match the vector		/// duplicated in all lanes. The type of the operand must match the vector
/// element type, except when they are integer types. In this case the		/// element type, except when they are integer types. In this case the
/// operand is allowed to be wider than the vector element type, and is		/// operand is allowed to be wider than the vector element type, and is
/// implicitly truncated to it.		/// implicitly truncated to it.
SPLAT_VECTOR,		SPLAT_VECTOR,
▲ Show 20 Lines • Show All 461 Lines • ▼ Show 20 Lines	enum NodeType {
ADJUST_TRAMPOLINE,		ADJUST_TRAMPOLINE,

/// TRAP - Trapping instruction		/// TRAP - Trapping instruction
TRAP,		TRAP,

/// DEBUGTRAP - Trap intended to get the attention of a debugger.		/// DEBUGTRAP - Trap intended to get the attention of a debugger.
DEBUGTRAP,		DEBUGTRAP,

/// UBSANTRAP - Trap with an immediate describing the kind of sanitizer failure.		/// UBSANTRAP - Trap with an immediate describing the kind of sanitizer failure.
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - /// UBSANTRAP - Trap with an immediate describing the kind of sanitizer failure. + /// UBSANTRAP - Trap with an immediate describing the kind of sanitizer + /// failure. Lint: Pre-merge checks: clang-format: please reformat the code ``` - /// UBSANTRAP - Trap with an immediate describing…
UBSANTRAP,		UBSANTRAP,

/// PREFETCH - This corresponds to a prefetch intrinsic. The first operand		/// PREFETCH - This corresponds to a prefetch intrinsic. The first operand
/// is the chain. The other operands are the address to prefetch,		/// is the chain. The other operands are the address to prefetch,
/// read / write specifier, locality specifier and instruction / data cache		/// read / write specifier, locality specifier and instruction / data cache
/// specifier.		/// specifier.
PREFETCH,		PREFETCH,

▲ Show 20 Lines • Show All 336 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 1,628 Lines • ▼ Show 20 Lines
	def int_experimental_vector_insert : DefaultAttrsIntrinsic<[llvm_anyvector_ty],			def int_experimental_vector_insert : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
	[LLVMMatchType<0>, llvm_anyvector_ty, llvm_i64_ty],			[LLVMMatchType<0>, llvm_anyvector_ty, llvm_i64_ty],
	[IntrNoMem, ImmArg<ArgIndex<2>>]>;			[IntrNoMem, ImmArg<ArgIndex<2>>]>;

	def int_experimental_vector_extract : DefaultAttrsIntrinsic<[llvm_anyvector_ty],			def int_experimental_vector_extract : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
	[llvm_anyvector_ty, llvm_i64_ty],			[llvm_anyvector_ty, llvm_i64_ty],
	[IntrNoMem, ImmArg<ArgIndex<1>>]>;			[IntrNoMem, ImmArg<ArgIndex<1>>]>;

				//===---------- Named shufflevector intrinsics ------===//
				def int_experimental_vector_splice : Intrinsic<[llvm_anyvector_ty],
				craig.topperUnsubmitted Done Reply Inline Actions Should this have DefaultAttrs? craig.topper: Should this have DefaultAttrs?
				[LLVMMatchType<0>,
				LLVMMatchType<0>,
				llvm_i32_ty],
				[IntrNoMem, ImmArg<ArgIndex<2>>]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Target-specific intrinsics			// Target-specific intrinsics
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	include "llvm/IR/IntrinsicsPowerPC.td"			include "llvm/IR/IntrinsicsPowerPC.td"
	include "llvm/IR/IntrinsicsX86.td"			include "llvm/IR/IntrinsicsX86.td"
	Show All 12 Lines

llvm/include/llvm/Target/TargetSelectionDAG.td

	Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines
	def SDTMaskedLoad: SDTypeProfile<1, 4, [ // masked load			def SDTMaskedLoad: SDTypeProfile<1, 4, [ // masked load
	SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisPtrTy<2>, SDTCisVec<3>, SDTCisSameAs<0, 4>,			SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisPtrTy<2>, SDTCisVec<3>, SDTCisSameAs<0, 4>,
	SDTCisSameNumEltsAs<0, 3>			SDTCisSameNumEltsAs<0, 3>
	]>;			]>;

	def SDTVecShuffle : SDTypeProfile<1, 2, [			def SDTVecShuffle : SDTypeProfile<1, 2, [
	SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>			SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>
	]>;			]>;
				def SDTVecSlice : SDTypeProfile<1, 3, [ // vector splice
				SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>, SDTCisInt<3>
				]>;
	def SDTVecExtract : SDTypeProfile<1, 2, [ // vector extract			def SDTVecExtract : SDTypeProfile<1, 2, [ // vector extract
	SDTCisEltOfVec<0, 1>, SDTCisPtrTy<2>			SDTCisEltOfVec<0, 1>, SDTCisPtrTy<2>
	]>;			]>;
	def SDTVecInsert : SDTypeProfile<1, 3, [ // vector insert			def SDTVecInsert : SDTypeProfile<1, 3, [ // vector insert
	SDTCisEltOfVec<2, 1>, SDTCisSameAs<0, 1>, SDTCisPtrTy<3>			SDTCisEltOfVec<2, 1>, SDTCisSameAs<0, 1>, SDTCisPtrTy<3>
	]>;			]>;
	def SDTVecReduce : SDTypeProfile<1, 1, [ // vector reduction			def SDTVecReduce : SDTypeProfile<1, 1, [ // vector reduction
	SDTCisInt<0>, SDTCisVec<1>			SDTCisInt<0>, SDTCisVec<1>
	▲ Show 20 Lines • Show All 394 Lines • ▼ Show 20 Lines
	def ld : SDNode<"ISD::LOAD" , SDTLoad,			def ld : SDNode<"ISD::LOAD" , SDTLoad,
	[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;			[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
	def st : SDNode<"ISD::STORE" , SDTStore,			def st : SDNode<"ISD::STORE" , SDTStore,
	[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;			[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
	def ist : SDNode<"ISD::STORE" , SDTIStore,			def ist : SDNode<"ISD::STORE" , SDTIStore,
	[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;			[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;

	def vector_shuffle : SDNode<"ISD::VECTOR_SHUFFLE", SDTVecShuffle, []>;			def vector_shuffle : SDNode<"ISD::VECTOR_SHUFFLE", SDTVecShuffle, []>;
				def vector_splice : SDNode<"ISD::VECTOR_SPLICE", SDTVecSlice, []>;
	def build_vector : SDNode<"ISD::BUILD_VECTOR", SDTypeProfile<1, -1, []>, []>;			def build_vector : SDNode<"ISD::BUILD_VECTOR", SDTypeProfile<1, -1, []>, []>;
	def splat_vector : SDNode<"ISD::SPLAT_VECTOR", SDTypeProfile<1, 1, []>, []>;			def splat_vector : SDNode<"ISD::SPLAT_VECTOR", SDTypeProfile<1, 1, []>, []>;
	def scalar_to_vector : SDNode<"ISD::SCALAR_TO_VECTOR", SDTypeProfile<1, 1, []>,			def scalar_to_vector : SDNode<"ISD::SCALAR_TO_VECTOR", SDTypeProfile<1, 1, []>,
	[]>;			[]>;

	// vector_extract/vector_insert are deprecated. extractelt/insertelt			// vector_extract/vector_insert are deprecated. extractelt/insertelt
	// are preferred.			// are preferred.
	def vector_extract : SDNode<"ISD::EXTRACT_VECTOR_ELT",			def vector_extract : SDNode<"ISD::EXTRACT_VECTOR_ELT",
	▲ Show 20 Lines • Show All 987 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	private:

SDValue ExpandBITREVERSE(SDValue Op, const SDLoc &dl);		SDValue ExpandBITREVERSE(SDValue Op, const SDLoc &dl);
SDValue ExpandBSWAP(SDValue Op, const SDLoc &dl);		SDValue ExpandBSWAP(SDValue Op, const SDLoc &dl);
SDValue ExpandPARITY(SDValue Op, const SDLoc &dl);		SDValue ExpandPARITY(SDValue Op, const SDLoc &dl);

SDValue ExpandExtractFromVectorThroughStack(SDValue Op);		SDValue ExpandExtractFromVectorThroughStack(SDValue Op);
SDValue ExpandInsertToVectorThroughStack(SDValue Op);		SDValue ExpandInsertToVectorThroughStack(SDValue Op);
SDValue ExpandVectorBuildThroughStack(SDNode* Node);		SDValue ExpandVectorBuildThroughStack(SDNode* Node);
		SDValue ExpandVectorSpliceThroughStack(SDNode *Node);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'ExpandVectorSpliceThroughStack' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'ExpandVectorSpliceThroughStack'…

SDValue ExpandConstantFP(ConstantFPSDNode *CFP, bool UseCP);		SDValue ExpandConstantFP(ConstantFPSDNode *CFP, bool UseCP);
SDValue ExpandConstant(ConstantSDNode *CP);		SDValue ExpandConstant(ConstantSDNode *CP);

// if ExpandNode returns false, LegalizeOp falls back to ConvertNodeToLibcall		// if ExpandNode returns false, LegalizeOp falls back to ConvertNodeToLibcall
bool ExpandNode(SDNode *Node);		bool ExpandNode(SDNode *Node);
void ConvertNodeToLibcall(SDNode *Node);		void ConvertNodeToLibcall(SDNode *Node);
void PromoteNode(SDNode *Node);		void PromoteNode(SDNode *Node);
▲ Show 20 Lines • Show All 1,265 Lines • ▼ Show 20 Lines	if (!Stores.empty()) // Not all undef elements?
StoreChain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Stores);		StoreChain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Stores);
else		else
StoreChain = DAG.getEntryNode();		StoreChain = DAG.getEntryNode();

// Result is a load from the stack slot.		// Result is a load from the stack slot.
return DAG.getLoad(VT, dl, StoreChain, FIPtr, PtrInfo);		return DAG.getLoad(VT, dl, StoreChain, FIPtr, PtrInfo);
}		}

		SDValue SelectionDAGLegalize::ExpandVectorSpliceThroughStack(SDNode *Node) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'ExpandVectorSpliceThroughStack' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'ExpandVectorSpliceThroughStack'…
		assert(Node->getOpcode() == ISD::VECTOR_SPLICE && "Unexpected opcode!");
		assert(Node->getValueType(0).isScalableVector() &&
		"Fixed length vector types expected to use SHUFFLE_VECTOR!");

		EVT VT = Node->getValueType(0);
		SDValue V1 = Node->getOperand(0);
		SDValue V2 = Node->getOperand(1);
		uint64_t TrailingElts = Node->getConstantOperandVal(2);
		SDLoc DL(Node);

		// Expand through memory thusly:
		// Alloca CONCAT_VECTORS_TYPES(V1, V2) Ptr
		// Store V1, Ptr
		// Store V2, Ptr + sizeof(V1)
		// Ptr = Ptr + sizeof(V1) - (TrailingElts * sizeof(VT.Elt))
		// Res = Load Ptr

		Type StoreType = VT.getTypeForEVT(DAG.getContext());
		Align Alignment = DAG.getDataLayout().getPrefTypeAlign(StoreType);

		EVT MemVT = EVT::getVectorVT(*DAG.getContext(), VT.getVectorElementType(),
		VT.getVectorElementCount() * 2);
		SDValue Ptr = DAG.CreateStackTemporary(MemVT.getStoreSize(), Alignment);
		EVT PtrVT = Ptr.getValueType();
		auto &MF = DAG.getMachineFunction();
		auto FrameIndex = cast<FrameIndexSDNode>(Ptr.getNode())->getIndex();
		auto PtrInfo = MachinePointerInfo::getFixedStack(MF, FrameIndex);

		// Store the lo part of CONCAT_VECTORS(V1, V2)
		SDValue StoreV1 = DAG.getStore(DAG.getEntryNode(), DL, V1, Ptr, PtrInfo);
		// Store the hi part of CONCAT_VECTORS(V1, V2)
		SDValue OffsetToV2 = DAG.getVScale(
		DL, PtrVT,
		APInt(PtrVT.getFixedSizeInBits(), VT.getStoreSize().getKnownMinSize()));
		Ptr = DAG.getNode(ISD::ADD, DL, PtrVT, Ptr, OffsetToV2);
		SDValue StoreV2 = DAG.getStore(StoreV1, DL, V2, Ptr, PtrInfo);

		// NOTE: TrailingElts must be clamped so as not to read outside of V1:V2.
		TypeSize EltByteSize = VT.getVectorElementType().getStoreSize();
		SDValue TrailingBytes =
		DAG.getConstant(TrailingElts * EltByteSize, DL, PtrVT);

		if (TrailingElts > VT.getVectorMinNumElements()) {
		SDValue VLBytes = DAG.getVScale(
		DL, PtrVT,
		APInt(PtrVT.getFixedSizeInBits(), VT.getStoreSize().getKnownMinSize()));
		TrailingBytes = DAG.getNode(ISD::UMIN, DL, PtrVT, TrailingBytes, VLBytes);
		}

		// Calculate the start address of the spliced result.
		Ptr = DAG.getNode(ISD::SUB, DL, PtrVT, Ptr, TrailingBytes);

		// Load the spliced result
		return DAG.getLoad(VT, DL, StoreV2, Ptr,
		MachinePointerInfo::getUnknownStack(MF));
		}

/// Bitcast a floating-point value to an integer value. Only bitcast the part		/// Bitcast a floating-point value to an integer value. Only bitcast the part
/// containing the sign bit if the target has no integer value capable of		/// containing the sign bit if the target has no integer value capable of
/// holding all bits of the floating-point value.		/// holding all bits of the floating-point value.
void SelectionDAGLegalize::getSignAsIntValue(FloatSignAsInt &State,		void SelectionDAGLegalize::getSignAsIntValue(FloatSignAsInt &State,
const SDLoc &DL,		const SDLoc &DL,
SDValue Value) const {		SDValue Value) const {
EVT FloatVT = Value.getValueType();		EVT FloatVT = Value.getValueType();
unsigned NumBits = FloatVT.getScalarSizeInBits();		unsigned NumBits = FloatVT.getScalarSizeInBits();
▲ Show 20 Lines • Show All 1,834 Lines • ▼ Show 20 Lines	case ISD::VECTOR_SHUFFLE: {
}		}

Tmp1 = DAG.getBuildVector(VT, dl, Ops);		Tmp1 = DAG.getBuildVector(VT, dl, Ops);
// We may have changed the BUILD_VECTOR type. Cast it back to the Node type.		// We may have changed the BUILD_VECTOR type. Cast it back to the Node type.
Tmp1 = DAG.getNode(ISD::BITCAST, dl, Node->getValueType(0), Tmp1);		Tmp1 = DAG.getNode(ISD::BITCAST, dl, Node->getValueType(0), Tmp1);
Results.push_back(Tmp1);		Results.push_back(Tmp1);
break;		break;
}		}
		case ISD::VECTOR_SPLICE: {
		Results.push_back(ExpandVectorSpliceThroughStack(Node));
		break;
		}
case ISD::EXTRACT_ELEMENT: {		case ISD::EXTRACT_ELEMENT: {
EVT OpTy = Node->getOperand(0).getValueType();		EVT OpTy = Node->getOperand(0).getValueType();
if (cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue()) {		if (cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue()) {
// 1 -> Hi		// 1 -> Hi
Tmp1 = DAG.getNode(ISD::SRL, dl, OpTy, Node->getOperand(0),		Tmp1 = DAG.getNode(ISD::SRL, dl, OpTy, Node->getOperand(0),
DAG.getConstant(OpTy.getSizeInBits() / 2, dl,		DAG.getConstant(OpTy.getSizeInBits() / 2, dl,
TLI.getShiftAmountTy(		TLI.getShiftAmountTy(
Node->getOperand(0).getValueType(),		Node->getOperand(0).getValueType(),
▲ Show 20 Lines • Show All 1,490 Lines • ▼ Show 20 Lines	case ISD::VECTOR_SHUFFLE: {
Tmp2 = DAG.getNode(ISD::BITCAST, dl, NVT, Node->getOperand(1));		Tmp2 = DAG.getNode(ISD::BITCAST, dl, NVT, Node->getOperand(1));

// Convert the shuffle mask to the right # elements.		// Convert the shuffle mask to the right # elements.
Tmp1 = ShuffleWithNarrowerEltType(NVT, OVT, dl, Tmp1, Tmp2, Mask);		Tmp1 = ShuffleWithNarrowerEltType(NVT, OVT, dl, Tmp1, Tmp2, Mask);
Tmp1 = DAG.getNode(ISD::BITCAST, dl, OVT, Tmp1);		Tmp1 = DAG.getNode(ISD::BITCAST, dl, OVT, Tmp1);
Results.push_back(Tmp1);		Results.push_back(Tmp1);
break;		break;
}		}
		case ISD::VECTOR_SPLICE: {
		Tmp1 = DAG.getNode(ISD::ANY_EXTEND, dl, NVT, Node->getOperand(0));
		Tmp2 = DAG.getNode(ISD::ANY_EXTEND, dl, NVT, Node->getOperand(1));
		Tmp3 = DAG.getNode(ISD::VECTOR_SPLICE, dl, NVT, Tmp1, Tmp2,
		Node->getOperand(2));
		Results.push_back(DAG.getNode(ISD::TRUNCATE, dl, OVT, Tmp3));
		break;
		}
case ISD::SETCC:		case ISD::SETCC:
case ISD::STRICT_FSETCC:		case ISD::STRICT_FSETCC:
case ISD::STRICT_FSETCCS: {		case ISD::STRICT_FSETCCS: {
unsigned ExtOp = ISD::FP_EXTEND;		unsigned ExtOp = ISD::FP_EXTEND;
if (NVT.isInteger()) {		if (NVT.isInteger()) {
ISD::CondCode CCCode = cast<CondCodeSDNode>(Node->getOperand(2))->get();		ISD::CondCode CCCode = cast<CondCodeSDNode>(Node->getOperand(2))->get();
ExtOp = isSignedIntSetCC(CCCode) ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;		ExtOp = isSignedIntSetCC(CCCode) ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
}		}
▲ Show 20 Lines • Show All 368 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	#endif
case ISD::UNDEF: Res = PromoteIntRes_UNDEF(N); break;		case ISD::UNDEF: Res = PromoteIntRes_UNDEF(N); break;
case ISD::VAARG: Res = PromoteIntRes_VAARG(N); break;		case ISD::VAARG: Res = PromoteIntRes_VAARG(N); break;
case ISD::VSCALE: Res = PromoteIntRes_VSCALE(N); break;		case ISD::VSCALE: Res = PromoteIntRes_VSCALE(N); break;

case ISD::EXTRACT_SUBVECTOR:		case ISD::EXTRACT_SUBVECTOR:
Res = PromoteIntRes_EXTRACT_SUBVECTOR(N); break;		Res = PromoteIntRes_EXTRACT_SUBVECTOR(N); break;
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
Res = PromoteIntRes_VECTOR_SHUFFLE(N); break;		Res = PromoteIntRes_VECTOR_SHUFFLE(N); break;
		case ISD::VECTOR_SPLICE:
		Res = PromoteIntRes_VECTOR_SPLICE(N); break;
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Res = PromoteIntRes_VECTOR_SPLICE(N); break; + Res = PromoteIntRes_VECTOR_SPLICE(N); + break; Lint: Pre-merge checks: clang-format: please reformat the code ``` - Res =…
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
Res = PromoteIntRes_INSERT_VECTOR_ELT(N); break;		Res = PromoteIntRes_INSERT_VECTOR_ELT(N); break;
case ISD::BUILD_VECTOR:		case ISD::BUILD_VECTOR:
Res = PromoteIntRes_BUILD_VECTOR(N); break;		Res = PromoteIntRes_BUILD_VECTOR(N); break;
case ISD::SCALAR_TO_VECTOR:		case ISD::SCALAR_TO_VECTOR:
Res = PromoteIntRes_SCALAR_TO_VECTOR(N); break;		Res = PromoteIntRes_SCALAR_TO_VECTOR(N); break;
case ISD::SPLAT_VECTOR:		case ISD::SPLAT_VECTOR:
Res = PromoteIntRes_SPLAT_VECTOR(N); break;		Res = PromoteIntRes_SPLAT_VECTOR(N); break;
▲ Show 20 Lines • Show All 4,473 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::ExpandIntOp_ATOMIC_STORE(SDNode *N) {
SDValue Swap = DAG.getAtomic(ISD::ATOMIC_SWAP, dl,		SDValue Swap = DAG.getAtomic(ISD::ATOMIC_SWAP, dl,
cast<AtomicSDNode>(N)->getMemoryVT(),		cast<AtomicSDNode>(N)->getMemoryVT(),
N->getOperand(0),		N->getOperand(0),
N->getOperand(1), N->getOperand(2),		N->getOperand(1), N->getOperand(2),
cast<AtomicSDNode>(N)->getMemOperand());		cast<AtomicSDNode>(N)->getMemOperand());
return Swap.getValue(1);		return Swap.getValue(1);
}		}

		SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_SPLICE(SDNode *N) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'PromoteIntRes_VECTOR_SPLICE' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'PromoteIntRes_VECTOR_SPLICE' [readability…
		SDLoc dl(N);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'dl' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'dl' [readability-identifier-naming]…

		SDValue V0 = GetPromotedInteger(N->getOperand(0));
		SDValue V1 = GetPromotedInteger(N->getOperand(1));
		EVT OutVT = V0.getValueType();

		return DAG.getNode(ISD::VECTOR_SPLICE, dl, OutVT, V0, V1, N->getOperand(2));
		}

SDValue DAGTypeLegalizer::PromoteIntRes_EXTRACT_SUBVECTOR(SDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntRes_EXTRACT_SUBVECTOR(SDNode *N) {

EVT OutVT = N->getValueType(0);		EVT OutVT = N->getValueType(0);
EVT NOutVT = TLI.getTypeToTransformTo(*DAG.getContext(), OutVT);		EVT NOutVT = TLI.getTypeToTransformTo(*DAG.getContext(), OutVT);
assert(NOutVT.isVector() && "This type must be promoted to a vector type");		assert(NOutVT.isVector() && "This type must be promoted to a vector type");
EVT NOutVTElem = NOutVT.getVectorElementType();		EVT NOutVTElem = NOutVT.getVectorElementType();

▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	private:
SDValue PromoteIntRes_MERGE_VALUES(SDNode *N, unsigned ResNo);		SDValue PromoteIntRes_MERGE_VALUES(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_AssertSext(SDNode *N);		SDValue PromoteIntRes_AssertSext(SDNode *N);
SDValue PromoteIntRes_AssertZext(SDNode *N);		SDValue PromoteIntRes_AssertZext(SDNode *N);
SDValue PromoteIntRes_Atomic0(AtomicSDNode *N);		SDValue PromoteIntRes_Atomic0(AtomicSDNode *N);
SDValue PromoteIntRes_Atomic1(AtomicSDNode *N);		SDValue PromoteIntRes_Atomic1(AtomicSDNode *N);
SDValue PromoteIntRes_AtomicCmpSwap(AtomicSDNode *N, unsigned ResNo);		SDValue PromoteIntRes_AtomicCmpSwap(AtomicSDNode *N, unsigned ResNo);
SDValue PromoteIntRes_EXTRACT_SUBVECTOR(SDNode *N);		SDValue PromoteIntRes_EXTRACT_SUBVECTOR(SDNode *N);
SDValue PromoteIntRes_VECTOR_SHUFFLE(SDNode *N);		SDValue PromoteIntRes_VECTOR_SHUFFLE(SDNode *N);
		SDValue PromoteIntRes_VECTOR_SPLICE(SDNode *N);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'PromoteIntRes_VECTOR_SPLICE' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'PromoteIntRes_VECTOR_SPLICE' [readability…
SDValue PromoteIntRes_BUILD_VECTOR(SDNode *N);		SDValue PromoteIntRes_BUILD_VECTOR(SDNode *N);
SDValue PromoteIntRes_SCALAR_TO_VECTOR(SDNode *N);		SDValue PromoteIntRes_SCALAR_TO_VECTOR(SDNode *N);
SDValue PromoteIntRes_SPLAT_VECTOR(SDNode *N);		SDValue PromoteIntRes_SPLAT_VECTOR(SDNode *N);
SDValue PromoteIntRes_EXTEND_VECTOR_INREG(SDNode *N);		SDValue PromoteIntRes_EXTEND_VECTOR_INREG(SDNode *N);
SDValue PromoteIntRes_INSERT_VECTOR_ELT(SDNode *N);		SDValue PromoteIntRes_INSERT_VECTOR_ELT(SDNode *N);
SDValue PromoteIntRes_CONCAT_VECTORS(SDNode *N);		SDValue PromoteIntRes_CONCAT_VECTORS(SDNode *N);
SDValue PromoteIntRes_BITCAST(SDNode *N);		SDValue PromoteIntRes_BITCAST(SDNode *N);
SDValue PromoteIntRes_BSWAP(SDNode *N);		SDValue PromoteIntRes_BSWAP(SDNode *N);
▲ Show 20 Lines • Show All 520 Lines • ▼ Show 20 Lines	private:
void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_INSERT_VECTOR_ELT(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);		void SplitVecRes_LOAD(LoadSDNode *LD, SDValue &Lo, SDValue &Hi);
void SplitVecRes_MLOAD(MaskedLoadSDNode *MLD, SDValue &Lo, SDValue &Hi);		void SplitVecRes_MLOAD(MaskedLoadSDNode *MLD, SDValue &Lo, SDValue &Hi);
void SplitVecRes_MGATHER(MaskedGatherSDNode *MGT, SDValue &Lo, SDValue &Hi);		void SplitVecRes_MGATHER(MaskedGatherSDNode *MGT, SDValue &Lo, SDValue &Hi);
void SplitVecRes_ScalarOp(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_ScalarOp(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_SETCC(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_SETCC(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_VECTOR_SHUFFLE(ShuffleVectorSDNode *N, SDValue &Lo,		void SplitVecRes_VECTOR_SHUFFLE(ShuffleVectorSDNode *N, SDValue &Lo,
SDValue &Hi);		SDValue &Hi);
		void SplitVecRes_VECTOR_SPLICE(SDNode *N, SDValue &Lo, SDValue &Hi);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'SplitVecRes_VECTOR_SPLICE' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'SplitVecRes_VECTOR_SPLICE' [readability…
void SplitVecRes_VAARG(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_VAARG(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_FP_TO_XINT_SAT(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_FP_TO_XINT_SAT(SDNode *N, SDValue &Lo, SDValue &Hi);

// Vector Operand Splitting: <128 x ty> -> 2 x <64 x ty>.		// Vector Operand Splitting: <128 x ty> -> 2 x <64 x ty>.
bool SplitVectorOperand(SDNode *N, unsigned OpNo);		bool SplitVectorOperand(SDNode *N, unsigned OpNo);
SDValue SplitVecOp_VSELECT(SDNode *N, unsigned OpNo);		SDValue SplitVecOp_VSELECT(SDNode *N, unsigned OpNo);
SDValue SplitVecOp_VECREDUCE(SDNode *N, unsigned OpNo);		SDValue SplitVecOp_VECREDUCE(SDNode *N, unsigned OpNo);
SDValue SplitVecOp_VECREDUCE_SEQ(SDNode *N);		SDValue SplitVecOp_VECREDUCE_SEQ(SDNode *N);
▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 927 Lines • ▼ Show 20 Lines	case ISD::MGATHER:
SplitVecRes_MGATHER(cast<MaskedGatherSDNode>(N), Lo, Hi);		SplitVecRes_MGATHER(cast<MaskedGatherSDNode>(N), Lo, Hi);
break;		break;
case ISD::SETCC:		case ISD::SETCC:
SplitVecRes_SETCC(N, Lo, Hi);		SplitVecRes_SETCC(N, Lo, Hi);
break;		break;
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
SplitVecRes_VECTOR_SHUFFLE(cast<ShuffleVectorSDNode>(N), Lo, Hi);		SplitVecRes_VECTOR_SHUFFLE(cast<ShuffleVectorSDNode>(N), Lo, Hi);
break;		break;
		case ISD::VECTOR_SPLICE:
		SplitVecRes_VECTOR_SPLICE(N, Lo, Hi);
		paulwalker-armUnsubmitted Done Reply Inline Actions This doesn't look great to me because it ties the hands of expandVectorSplice, whose name does not suggest it must return a load. If you prefer not to explicitly create the split loads then it seems more natural, and future proof, if the result of expandVectorSplice is split using a pair of EXTRACT_SUBVECTOR operations. paulwalker-arm: This doesn't look great to me because it ties the hands of expandVectorSplice, whose name does…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions This doesn't look great to me because it ties the hands of expandVectorSplice, whose name does not suggest it must return a load. If you prefer not to explicitly create the split loads then it seems more natural, and future proof, if the result of expandVectorSplice is split using a pair of EXTRACT_SUBVECTOR operations. Fair point, I've used EXTRACT_SUBVECTOR instead. I should point out warnings were being generated for the `nxv16f32` splitvec tests by the `getVectorNumElements` call in `DAGTypeLegalizer::SplitVecRes_EXTRACT_SUBVECTOR`. To remove the warnings I changed it to `getVectorMinNumElements`. c-rhodes: > This doesn't look great to me because it ties the hands of expandVectorSplice, whose name…
		break;
case ISD::VAARG:		case ISD::VAARG:
SplitVecRes_VAARG(N, Lo, Hi);		SplitVecRes_VAARG(N, Lo, Hi);
break;		break;

case ISD::ANY_EXTEND_VECTOR_INREG:		case ISD::ANY_EXTEND_VECTOR_INREG:
case ISD::SIGN_EXTEND_VECTOR_INREG:		case ISD::SIGN_EXTEND_VECTOR_INREG:
case ISD::ZERO_EXTEND_VECTOR_INREG:		case ISD::ZERO_EXTEND_VECTOR_INREG:
SplitVecRes_ExtVecInRegOp(N, Lo, Hi);		SplitVecRes_ExtVecInRegOp(N, Lo, Hi);
▲ Show 20 Lines • Show All 4,543 Lines • ▼ Show 20 Lines	Ops[Idx] = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, EltVT, InOp,
DAG.getVectorIdxConstant(Idx, dl));		DAG.getVectorIdxConstant(Idx, dl));

SDValue FillVal = FillWithZeroes ? DAG.getConstant(0, dl, EltVT) :		SDValue FillVal = FillWithZeroes ? DAG.getConstant(0, dl, EltVT) :
DAG.getUNDEF(EltVT);		DAG.getUNDEF(EltVT);
for ( ; Idx < WidenNumElts; ++Idx)		for ( ; Idx < WidenNumElts; ++Idx)
Ops[Idx] = FillVal;		Ops[Idx] = FillVal;
return DAG.getBuildVector(NVT, dl, Ops);		return DAG.getBuildVector(NVT, dl, Ops);
}		}

		void DAGTypeLegalizer::SplitVecRes_VECTOR_SPLICE(SDNode *N, SDValue &Lo,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'SplitVecRes_VECTOR_SPLICE' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'SplitVecRes_VECTOR_SPLICE' [readability…
		SDValue &Hi) {
		EVT VT = N->getValueType(0);
		SDValue V1 = N->getOperand(0);
		SDValue V2 = N->getOperand(1);
		uint64_t TrailingElts = N->getConstantOperandVal(2);
		SDLoc DL(N);

		EVT LoVT, HiVT;
		std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(VT);

		// The operation cannot be split in two so expand it instead:
		// Alloca CONCAT_VECTORS_TYPES(V1, V2) Ptr
		// Store V1, Ptr
		// Store V2, Ptr + sizeof(V1)
		// Ptr = Ptr + sizeof(V1) - (TrailingElts * sizeof(VT.Elt))
		// Lo = Load Ptr
		// hi = Load Ptr + sizeof(lo)

		// In cases where the vector is illegal it will be broken down into parts
		sdesmalenUnsubmitted Done Reply Inline Actions is there a way to split the load using existing code? sdesmalen: is there a way to split the load using existing code?
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions is there a way to split the load using existing code? Splitting the full load from `expandVectorSplice` with `SplitVecRes_LOAD` seems to work. c-rhodes: > is there a way to split the load using existing code? Splitting the full load from…
		// and stored in parts - we should use the alignment for the smallest part.
		Align SmallestAlign = DAG.getReducedAlign(VT, /UseABI=/false);

		EVT MemVT = EVT::getVectorVT(*DAG.getContext(), VT.getVectorElementType(),
		VT.getVectorElementCount() * 2);
		SDValue Ptr = DAG.CreateStackTemporary(MemVT.getStoreSize(), SmallestAlign);
		EVT PtrVT = Ptr.getValueType();
		auto &MF = DAG.getMachineFunction();
		auto FrameIndex = cast<FrameIndexSDNode>(Ptr.getNode())->getIndex();
		auto PtrInfo = MachinePointerInfo::getFixedStack(MF, FrameIndex);

		// Store the lo part of CONCAT_VECTORS(V1, V2)
		SDValue StoreV1 =
		DAG.getStore(DAG.getEntryNode(), DL, V1, Ptr, PtrInfo, SmallestAlign);
		// Store the hi part of CONCAT_VECTORS(V1, V2)
		IncrementPointer(cast<MemSDNode>(StoreV1), VT, PtrInfo, Ptr);
		SDValue StoreV2 = DAG.getStore(StoreV1, DL, V2, Ptr, PtrInfo, SmallestAlign);

		// NOTE: TrailingElts must be clamped so as not to read outside of V1:V2.
		TypeSize EltByteSize = VT.getVectorElementType().getStoreSize();
		SDValue TrailingBytes =
		DAG.getConstant(TrailingElts * EltByteSize, DL, PtrVT);
		if (TrailingElts > VT.getVectorMinNumElements()) {
		SDValue VLBytes = DAG.getVScale(
		DL, PtrVT,
		APInt(PtrVT.getFixedSizeInBits(), VT.getStoreSize().getKnownMinSize()));
		TrailingBytes = DAG.getNode(ISD::UMIN, DL, PtrVT, TrailingBytes, VLBytes);
		}

		// Calculate the start address of the spliced result.
		Ptr = DAG.getNode(ISD::SUB, DL, PtrVT, Ptr, TrailingBytes);

		// Load the lo part of the spliced result
		Lo = DAG.getLoad(LoVT, DL, StoreV2, Ptr,
		MachinePointerInfo::getUnknownStack(MF));
		// Load the hi part of the spliced result
		MachinePointerInfo MPI = cast<LoadSDNode>(Lo)->getPointerInfo();
		IncrementPointer(cast<MemSDNode>(Lo), LoVT, MPI, Ptr);
		Hi = DAG.getLoad(HiVT, DL, StoreV2, Ptr, MPI);
		}

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show First 20 Lines • Show All 767 Lines • ▼ Show 20 Lines	private:
void visitStackmap(const CallInst &I);		void visitStackmap(const CallInst &I);
void visitPatchpoint(const CallBase &CB, const BasicBlock *EHPadBB = nullptr);		void visitPatchpoint(const CallBase &CB, const BasicBlock *EHPadBB = nullptr);

// These two are implemented in StatepointLowering.cpp		// These two are implemented in StatepointLowering.cpp
void visitGCRelocate(const GCRelocateInst &Relocate);		void visitGCRelocate(const GCRelocateInst &Relocate);
void visitGCResult(const GCResultInst &I);		void visitGCResult(const GCResultInst &I);

void visitVectorReduce(const CallInst &I, unsigned Intrinsic);		void visitVectorReduce(const CallInst &I, unsigned Intrinsic);
		void visitVectorSplice(const CallInst &I);

void visitUserOp1(const Instruction &I) {		void visitUserOp1(const Instruction &I) {
llvm_unreachable("UserOp1 should not exist at instruction selection time!");		llvm_unreachable("UserOp1 should not exist at instruction selection time!");
}		}
void visitUserOp2(const Instruction &I) {		void visitUserOp2(const Instruction &I) {
llvm_unreachable("UserOp2 should not exist at instruction selection time!");		llvm_unreachable("UserOp2 should not exist at instruction selection time!");
}		}

▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,992 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_vector_extract: {

SDValue Vec = getValue(I.getOperand(0));		SDValue Vec = getValue(I.getOperand(0));
SDValue Index = getValue(I.getOperand(1));		SDValue Index = getValue(I.getOperand(1));
EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), I.getType());		EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), I.getType());

setValue(&I, DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResultVT, Vec, Index));		setValue(&I, DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResultVT, Vec, Index));
return;		return;
}		}
		case Intrinsic::experimental_vector_splice:
		visitVectorSplice(I);
		return;
}		}
}		}

void SelectionDAGBuilder::visitConstrainedFPIntrinsic(		void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
const ConstrainedFPIntrinsic &FPI) {		const ConstrainedFPIntrinsic &FPI) {
SDLoc sdl = getCurSDLoc();		SDLoc sdl = getCurSDLoc();

const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
▲ Show 20 Lines • Show All 3,808 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitFreeze(const FreezeInst &I) {

for (unsigned i = 0; i != NumValues; ++i)		for (unsigned i = 0; i != NumValues; ++i)
Values[i] = DAG.getNode(ISD::FREEZE, getCurSDLoc(), ValueVTs[i],		Values[i] = DAG.getNode(ISD::FREEZE, getCurSDLoc(), ValueVTs[i],
SDValue(Op.getNode(), Op.getResNo() + i));		SDValue(Op.getNode(), Op.getResNo() + i));

setValue(&I, DAG.getNode(ISD::MERGE_VALUES, getCurSDLoc(),		setValue(&I, DAG.getNode(ISD::MERGE_VALUES, getCurSDLoc(),
DAG.getVTList(ValueVTs), Values));		DAG.getVTList(ValueVTs), Values));
}		}

		void SelectionDAGBuilder::visitVectorSplice(const CallInst &I) {
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());

		SDLoc DL = getCurSDLoc();
		SDValue V1 = getValue(I.getOperand(0));
		SDValue V2 = getValue(I.getOperand(1));
		unsigned TrailingElts = cast<ConstantInt>(I.getOperand(2))->getZExtValue();

		// VECTOR_SHUFFLE doesn't support a scalable mask so use a dedicated node.
		if (VT.isScalableVector()) {
		MVT IdxVT = TLI.getVectorIdxTy(DAG.getDataLayout());
		setValue(&I, DAG.getNode(ISD::VECTOR_SPLICE, DL, VT, V1, V2,
		DAG.getConstant(TrailingElts, DL, IdxVT)));
		return;
		}

		unsigned NumElts = VT.getVectorNumElements();
		assert(TrailingElts <= NumElts && "Invalid number of trailing elements!");

		// Use VECTOR_SHUFFLE to maintain original behaviour for fixed-length vectors.
		SmallVector<int, 8> Mask;
		for (unsigned i = 0; i != NumElts; ++i)
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…
		Mask.push_back(NumElts - TrailingElts + i);

		setValue(&I, DAG.getVectorShuffle(VT, DL, V1, V2, Mask));
		sdesmalenUnsubmitted Not Done Reply Inline Actions just `Imm % NumElts`; ? sdesmalen: just `Imm % NumElts`; ?
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions just `Imm % NumElts`; ? I tried that at first but found the behaviour isn't correct for negative immediates, for example a trailing element count of `-15` and 16 elements `-15 % 16 = -15`, so end up with this mask: `[-15,-14,-13,-12,-11,-10,-9,-8,-7,-6,-5,-4,-3,-2,-1,0]`. From what I read the sign of the modulo result is implementation defined if one or both of the operands are are negative. c-rhodes: > just `Imm % NumElts`; ? I tried that at first but found the behaviour isn't correct for…
		bin.cheng-aliUnsubmitted Done Reply Inline Actions Then `(NumElts + Imm) % NumElts` ? bin.cheng-ali: Then `(NumElts + Imm) % NumElts` ?
		}

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	#endif
case ISD::SELECT_CC: return "select_cc";		case ISD::SELECT_CC: return "select_cc";
case ISD::INSERT_VECTOR_ELT: return "insert_vector_elt";		case ISD::INSERT_VECTOR_ELT: return "insert_vector_elt";
case ISD::EXTRACT_VECTOR_ELT: return "extract_vector_elt";		case ISD::EXTRACT_VECTOR_ELT: return "extract_vector_elt";
case ISD::CONCAT_VECTORS: return "concat_vectors";		case ISD::CONCAT_VECTORS: return "concat_vectors";
case ISD::INSERT_SUBVECTOR: return "insert_subvector";		case ISD::INSERT_SUBVECTOR: return "insert_subvector";
case ISD::EXTRACT_SUBVECTOR: return "extract_subvector";		case ISD::EXTRACT_SUBVECTOR: return "extract_subvector";
case ISD::SCALAR_TO_VECTOR: return "scalar_to_vector";		case ISD::SCALAR_TO_VECTOR: return "scalar_to_vector";
case ISD::VECTOR_SHUFFLE: return "vector_shuffle";		case ISD::VECTOR_SHUFFLE: return "vector_shuffle";
		case ISD::VECTOR_SPLICE: return "vector_splice";
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case ISD::VECTOR_SPLICE: return "vector_splice"; + case ISD::VECTOR_SPLICE: + return "vector_splice"; Lint: Pre-merge checks: clang-format: please reformat the code ``` - case ISD::VECTOR_SPLICE: return…
case ISD::SPLAT_VECTOR: return "splat_vector";		case ISD::SPLAT_VECTOR: return "splat_vector";
case ISD::CARRY_FALSE: return "carry_false";		case ISD::CARRY_FALSE: return "carry_false";
case ISD::ADDC: return "addc";		case ISD::ADDC: return "addc";
case ISD::ADDE: return "adde";		case ISD::ADDE: return "adde";
case ISD::ADDCARRY: return "addcarry";		case ISD::ADDCARRY: return "addcarry";
case ISD::SADDO_CARRY: return "saddo_carry";		case ISD::SADDO_CARRY: return "saddo_carry";
case ISD::SADDO: return "saddo";		case ISD::SADDO: return "saddo";
case ISD::UADDO: return "uaddo";		case ISD::UADDO: return "uaddo";
▲ Show 20 Lines • Show All 742 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 843 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
setOperationAction(ISD::VECREDUCE_SMAX, VT, Expand);		setOperationAction(ISD::VECREDUCE_SMAX, VT, Expand);
setOperationAction(ISD::VECREDUCE_SMIN, VT, Expand);		setOperationAction(ISD::VECREDUCE_SMIN, VT, Expand);
setOperationAction(ISD::VECREDUCE_UMAX, VT, Expand);		setOperationAction(ISD::VECREDUCE_UMAX, VT, Expand);
setOperationAction(ISD::VECREDUCE_UMIN, VT, Expand);		setOperationAction(ISD::VECREDUCE_UMIN, VT, Expand);
setOperationAction(ISD::VECREDUCE_FMAX, VT, Expand);		setOperationAction(ISD::VECREDUCE_FMAX, VT, Expand);
setOperationAction(ISD::VECREDUCE_FMIN, VT, Expand);		setOperationAction(ISD::VECREDUCE_FMIN, VT, Expand);
setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Expand);		setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Expand);
setOperationAction(ISD::VECREDUCE_SEQ_FMUL, VT, Expand);		setOperationAction(ISD::VECREDUCE_SEQ_FMUL, VT, Expand);

		// Named vector shuffles default to expand.
		setOperationAction(ISD::VECTOR_SPLICE, VT, Expand);
}		}

// Most targets ignore the @llvm.prefetch intrinsic.		// Most targets ignore the @llvm.prefetch intrinsic.
setOperationAction(ISD::PREFETCH, MVT::Other, Expand);		setOperationAction(ISD::PREFETCH, MVT::Other, Expand);

// Most targets also ignore the @llvm.readcyclecounter intrinsic.		// Most targets also ignore the @llvm.readcyclecounter intrinsic.
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Expand);		setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Expand);

▲ Show 20 Lines • Show All 1,468 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 894 Lines • ▼ Show 20 Lines	private:
SDValue LowerSPONENTRY(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSPONENTRY(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerVECTOR_SPLICE(SDValue Op, SelectionDAG &DAG) const;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LowerVECTOR_SPLICE' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'LowerVECTOR_SPLICE' [readability…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions It looks like you're missing the implementation of this function and some associated isel patterns, which explains the poor code generation for the SVE _1 test variants. Or are you planning to add those under a second patch, in which case this function declaration should be removed from this patch. paulwalker-arm: It looks like you're missing the implementation of this function and some associated isel…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions It looks like you're missing the implementation of this function and some associated isel patterns, which explains the poor code generation for the SVE _1 test variants. Or are you planning to add those under a second patch, in which case this function declaration should be removed from this patch. The plan is to upstream those patterns separately, they were initially part of this patch but they depended on some changes we have downstream to use the SIMD variant of INSR when the scalar argument comes from a vector extract. It looks like I missed this declaration, I'll remove it. c-rhodes: > It looks like you're missing the implementation of this function and some associated isel…
SDValue LowerSPLAT_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSPLAT_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDUPQLane(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDUPQLane(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerToPredicatedOp(SDValue Op, SelectionDAG &DAG, unsigned NewOp,		SDValue LowerToPredicatedOp(SDValue Op, SelectionDAG &DAG, unsigned NewOp,
bool OverrideNEON = false) const;		bool OverrideNEON = false) const;
SDValue LowerToScalableOp(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerToScalableOp(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINSERT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINSERT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDIV(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDIV(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 124 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,084 Lines • ▼ Show 20 Lines	for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {
setOperationAction(ISD::SINT_TO_FP, VT, Custom);		setOperationAction(ISD::SINT_TO_FP, VT, Custom);
setOperationAction(ISD::FP_TO_UINT, VT, Custom);		setOperationAction(ISD::FP_TO_UINT, VT, Custom);
setOperationAction(ISD::FP_TO_SINT, VT, Custom);		setOperationAction(ISD::FP_TO_SINT, VT, Custom);
setOperationAction(ISD::MGATHER, VT, Custom);		setOperationAction(ISD::MGATHER, VT, Custom);
setOperationAction(ISD::MSCATTER, VT, Custom);		setOperationAction(ISD::MSCATTER, VT, Custom);
setOperationAction(ISD::MUL, VT, Custom);		setOperationAction(ISD::MUL, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
		setOperationAction(ISD::SETCC, VT, Custom);
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Is this change required for the splice patch? I wonder if it should be broken out to its own commit. cameron.mcinally: Is this change required for the splice patch? I wonder if it should be broken out to its own…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions Is this change required for the splice patch? I wonder if it should be broken out to its own commit. This fixes a selection error I hit when splicing predicates. The promotion of predicates uses a truncate which gets lowered to a setcc that crashes on operands of type `VT`, e.g.: t17: nxv2i64 = and t29, t30 t31: nxv2i64 = AArch64ISD::DUP Constant:i64<0> t19: nxv2i1 = setcc t17, t31, setne:ch I can split this into a separate patch. c-rhodes: > Is this change required for the splice patch? I wonder if it should be broken out to its own…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions Is this change required for the splice patch? I wonder if it should be broken out to its own commit. This fixes a selection error I hit when splicing predicates. The promotion of predicates uses a truncate which gets lowered to a setcc that crashes on operands of type `VT`, e.g.: t17: nxv2i64 = and t29, t30 t31: nxv2i64 = AArch64ISD::DUP Constant:i64<0> t19: nxv2i1 = setcc t17, t31, setne:ch I can split this into a separate patch. I tried splitting this out but I couldn't figure out how to test it so I've left it in this patch. What I found was the setcc being generated by the splice occurs after result type legalization, so without the above a setcc returning an `nxv2i1` and taking two `nxv2i64` vectors is considered legal at this point in selection. With the above it falls into `SelectionDAGLegalize::LegalizeOp` which considers the legality based on the operand VTs, so setcc then gets custom lowered to the target-specific predicated variant `AArch64ISD::SETCC_MERGE_ZERO` that can be selected. c-rhodes: > > Is this change required for the splice patch? I wonder if it should be broken out to its…
setOperationAction(ISD::SDIV, VT, Custom);		setOperationAction(ISD::SDIV, VT, Custom);
setOperationAction(ISD::UDIV, VT, Custom);		setOperationAction(ISD::UDIV, VT, Custom);
setOperationAction(ISD::SMIN, VT, Custom);		setOperationAction(ISD::SMIN, VT, Custom);
setOperationAction(ISD::UMIN, VT, Custom);		setOperationAction(ISD::UMIN, VT, Custom);
setOperationAction(ISD::SMAX, VT, Custom);		setOperationAction(ISD::SMAX, VT, Custom);
setOperationAction(ISD::UMAX, VT, Custom);		setOperationAction(ISD::UMAX, VT, Custom);
setOperationAction(ISD::SHL, VT, Custom);		setOperationAction(ISD::SHL, VT, Custom);
setOperationAction(ISD::SRL, VT, Custom);		setOperationAction(ISD::SRL, VT, Custom);
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	if (Subtarget->useSVEForFixedLengthVectors()) {
for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32,		for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32,
MVT::v1f64, MVT::v2f64})		MVT::v1f64, MVT::v2f64})
setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);

// Use SVE for vectors with more than 2 elements.		// Use SVE for vectors with more than 2 elements.
for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v4f32})		for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v4f32})
setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);
}		}

		setOperationAction(ISD::VECTOR_SPLICE, MVT::nxv2i1, Promote);
		craig.topperUnsubmitted Done Reply Inline Actions Can you use setOperationPromotedToType here? craig.topper: Can you use setOperationPromotedToType here?
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions Can you use setOperationPromotedToType here? That's nice, wasn't aware of that thanks. c-rhodes: > Can you use setOperationPromotedToType here? That's nice, wasn't aware of that thanks.
		AddPromotedToType(ISD::VECTOR_SPLICE, MVT::nxv2i1, MVT::nxv2i64);
		setOperationAction(ISD::VECTOR_SPLICE, MVT::nxv4i1, Promote);
		AddPromotedToType(ISD::VECTOR_SPLICE, MVT::nxv4i1, MVT::nxv4i32);
		setOperationAction(ISD::VECTOR_SPLICE, MVT::nxv8i1, Promote);
		AddPromotedToType(ISD::VECTOR_SPLICE, MVT::nxv8i1, MVT::nxv8i16);
		setOperationAction(ISD::VECTOR_SPLICE, MVT::nxv16i1, Promote);
		AddPromotedToType(ISD::VECTOR_SPLICE, MVT::nxv16i1, MVT::nxv16i8);
}		}

PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();		PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();
}		}

void AArch64TargetLowering::addTypeForNEON(MVT VT, MVT PromotedBitwiseVT) {		void AArch64TargetLowering::addTypeForNEON(MVT VT, MVT PromotedBitwiseVT) {
assert(VT.isVector() && "VT should be a vector type");		assert(VT.isVector() && "VT should be a vector type");

▲ Show 20 Lines • Show All 15,897 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/named-vector-shuffles-neon.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -verify-machineinstrs < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				;
				; VECTOR_SPLICE
				;

				define <16 x i8> @splice_v16i8(<16 x i8> %a, <16 x i8> %b) #0 {
				; CHECK-LABEL: splice_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ext v0.16b, v0.16b, v1.16b, #1
				; CHECK-NEXT: ret
				%res = call <16 x i8> @llvm.experimental.vector.splice.v16i8(<16 x i8> %a, <16 x i8> %b, i32 15)
				ret <16 x i8> %res
				}

				define <8 x i16> @splice_v8i16(<8 x i16> %a, <8 x i16> %b) #0 {
				; CHECK-LABEL: splice_v8i16:
				sdesmalenUnsubmitted Done Reply Inline Actions nit: not sure if there is a lot of value to do this for each possible element-type, if you do two (i8 and some other type) that may be sufficient. sdesmalen: nit: not sure if there is a lot of value to do this for each possible element-type, if you do…
				; CHECK: // %bb.0:
				; CHECK-NEXT: ext v0.16b, v0.16b, v1.16b, #4
				; CHECK-NEXT: ret
				%res = call <8 x i16> @llvm.experimental.vector.splice.v8i16(<8 x i16> %a, <8 x i16> %b, i32 6)
				ret <8 x i16> %res
				}

				define <4 x i32> @splice_v4i32(<4 x i32> %a, <4 x i32> %b) #0 {
				; CHECK-LABEL: splice_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ext v0.16b, v0.16b, v1.16b, #8
				; CHECK-NEXT: ret
				%res = call <4 x i32> @llvm.experimental.vector.splice.v4i32(<4 x i32> %a, <4 x i32> %b, i32 2)
				ret <4 x i32> %res
				}

				define <2 x i64> @splice_v2i64(<2 x i64> %a, <2 x i64> %b) #0 {
				; CHECK-LABEL: splice_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ext v0.16b, v0.16b, v1.16b, #8
				; CHECK-NEXT: ret
				%res = call <2 x i64> @llvm.experimental.vector.splice.v2i64(<2 x i64> %a, <2 x i64> %b, i32 1)
				ret <2 x i64> %res
				}

				define <8 x half> @splice_v8f16(<8 x half> %a, <8 x half> %b) #0 {
				; CHECK-LABEL: splice_v8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ext v0.16b, v0.16b, v1.16b, #2
				; CHECK-NEXT: ret
				%res = call <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half> %a, <8 x half> %b, i32 7)
				ret <8 x half> %res
				}

				define <4 x float> @splice_v4f32(<4 x float> %a, <4 x float> %b) #0 {
				; CHECK-LABEL: splice_v4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ext v0.16b, v0.16b, v1.16b, #4
				; CHECK-NEXT: ret
				%res = call <4 x float> @llvm.experimental.vector.splice.v4f32(<4 x float> %a, <4 x float> %b, i32 3)
				ret <4 x float> %res
				}

				define <2 x double> @splice_v2f64(<2 x double> %a, <2 x double> %b) #0 {
				; CHECK-LABEL: splice_v2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ext v0.16b, v0.16b, v1.16b, #8
				; CHECK-NEXT: ret
				%res = call <2 x double> @llvm.experimental.vector.splice.v2f64(<2 x double> %a, <2 x double> %b, i32 1)
				ret <2 x double> %res
				}

				; Verify promote type legalisation works as expected.
				define <2 x i8> @splice_v2i8(<2 x i8> %a, <2 x i8> %b) #0 {
				; CHECK-LABEL: splice_v2i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ext v0.8b, v0.8b, v1.8b, #4
				; CHECK-NEXT: ret
				%res = call <2 x i8> @llvm.experimental.vector.splice.v2i8(<2 x i8> %a, <2 x i8> %b, i32 1)
				ret <2 x i8> %res
				}

				; Verify splitvec type legalisation works as expected.
				define <8 x i32> @splice_v8i32(<8 x i32> %a, <8 x i32> %b) #0 {
				; CHECK-LABEL: splice_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ext v0.16b, v1.16b, v2.16b, #4
				; CHECK-NEXT: ext v1.16b, v2.16b, v3.16b, #4
				; CHECK-NEXT: ret
				%res = call <8 x i32> @llvm.experimental.vector.splice.v8i32(<8 x i32> %a, <8 x i32> %b, i32 3)
				ret <8 x i32> %res
				}

				; Verify splitvec type legalisation works as expected.
				define <16 x float> @splice_v16f32(<16 x float> %a, <16 x float> %b) #0 {
				; CHECK-LABEL: splice_v16f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ext v0.16b, v1.16b, v2.16b, #12
				; CHECK-NEXT: ext v1.16b, v2.16b, v3.16b, #12
				; CHECK-NEXT: ext v2.16b, v3.16b, v4.16b, #12
				; CHECK-NEXT: ext v3.16b, v4.16b, v5.16b, #12
				; CHECK-NEXT: ret
				%res = call <16 x float> @llvm.experimental.vector.splice.v16f32(<16 x float> %a, <16 x float> %b, i32 9)
				ret <16 x float> %res
				}

				declare <2 x i8> @llvm.experimental.vector.splice.v2i8(<2 x i8>, <2 x i8>, i32)
				declare <16 x i8> @llvm.experimental.vector.splice.v16i8(<16 x i8>, <16 x i8>, i32)
				declare <8 x i16> @llvm.experimental.vector.splice.v8i16(<8 x i16>, <8 x i16>, i32)
				declare <4 x i32> @llvm.experimental.vector.splice.v4i32(<4 x i32>, <4 x i32>, i32)
				declare <8 x i32> @llvm.experimental.vector.splice.v8i32(<8 x i32>, <8 x i32>, i32)
				declare <2 x i64> @llvm.experimental.vector.splice.v2i64(<2 x i64>, <2 x i64>, i32)
				declare <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half>, <8 x half>, i32)
				declare <4 x float> @llvm.experimental.vector.splice.v4f32(<4 x float>, <4 x float>, i32)
				declare <16 x float> @llvm.experimental.vector.splice.v16f32(<16 x float>, <16 x float>, i32)
				declare <2 x double> @llvm.experimental.vector.splice.v2f64(<2 x double>, <2 x double>, i32)

				attributes #0 = { nounwind "target-features"="+neon" }

llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -verify-machineinstrs < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				;
				; VECTOR_SPLICE
				;

				define <vscale x 16 x i8> @splice_nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: splice_nxv16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1b { z0.b }, p0, [sp]
				; CHECK-NEXT: st1b { z1.b }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #16 // =16
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, i32 16)
				ret <vscale x 16 x i8> %res
				}

				define <vscale x 16 x i8> @splice_nxv16i8_1(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: splice_nxv16i8_1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1b { z0.b }, p0, [sp]
				; CHECK-NEXT: st1b { z1.b }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, i32 1)
				ret <vscale x 16 x i8> %res
				}

				; Ensure number of trailing elements is clamped when we cannot prove it's less than VL.
				define <vscale x 16 x i8> @splice_nxv16i8_clamped(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: splice_nxv16i8_clamped:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: rdvl x9, #1
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: mov w10, #17
				; CHECK-NEXT: cmp x9, #17 // =17
				; CHECK-NEXT: st1b { z0.b }, p0, [sp]
				; CHECK-NEXT: st1b { z1.b }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: csel x9, x9, x10, lo
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, x9
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, i32 17)
				ret <vscale x 16 x i8> %res
				}

				define <vscale x 8 x i16> @splice_nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: splice_nxv8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #16 // =16
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 8)
				ret <vscale x 8 x i16> %res
				}

				define <vscale x 8 x i16> @splice_nxv8i16_1(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: splice_nxv8i16_1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #2 // =2
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 1)
				ret <vscale x 8 x i16> %res
				}

				; Ensure number of trailing elements is clamped when we cannot prove it's less than VL.
				define <vscale x 8 x i16> @splice_nxv8i16_clamped(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: splice_nxv8i16_clamped:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: rdvl x9, #1
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: mov w10, #18
				; CHECK-NEXT: cmp x9, #18 // =18
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: csel x9, x9, x10, lo
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, x9
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, i32 9)
				ret <vscale x 8 x i16> %res
				}

				define <vscale x 4 x i32> @splice_nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: splice_nxv4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #16 // =16
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 4)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @splice_nxv4i32_1(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: splice_nxv4i32_1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #4 // =4
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 1)
				ret <vscale x 4 x i32> %res
				}

				; Ensure number of trailing elements is clamped when we cannot prove it's less than VL.
				define <vscale x 4 x i32> @splice_nxv4i32_clamped(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: splice_nxv4i32_clamped:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: rdvl x9, #1
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: mov w10, #20
				; CHECK-NEXT: cmp x9, #20 // =20
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: csel x9, x9, x10, lo
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, x9
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, i32 5)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @splice_nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: splice_nxv2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: st1d { z1.d }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #16 // =16
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, i32 2)
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @splice_nxv2i64_1(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: splice_nxv2i64_1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: st1d { z1.d }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #8 // =8
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, i32 1)
				ret <vscale x 2 x i64> %res
				}

				; Ensure number of trailing elements is clamped when we cannot prove it's less than VL.
				define <vscale x 2 x i64> @splice_nxv2i64_clamped(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: splice_nxv2i64_clamped:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: rdvl x9, #1
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: mov w10, #24
				; CHECK-NEXT: cmp x9, #24 // =24
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: st1d { z1.d }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: csel x9, x9, x10, lo
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, x9
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, i32 3)
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 8 x half> @splice_nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: splice_nxv8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #16 // =16
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 8)
				ret <vscale x 8 x half> %res
				}

				define <vscale x 8 x half> @splice_nxv8f16_1(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: splice_nxv8f16_1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #2 // =2
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 1)
				ret <vscale x 8 x half> %res
				}

				; Ensure number of trailing elements is clamped when we cannot prove it's less than VL.
				define <vscale x 8 x half> @splice_nxv8f16_clamped(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: splice_nxv8f16_clamped:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: rdvl x9, #1
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: mov w10, #18
				; CHECK-NEXT: cmp x9, #18 // =18
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: csel x9, x9, x10, lo
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, x9
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, i32 9)
				ret <vscale x 8 x half> %res
				}

				define <vscale x 4 x float> @splice_nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: splice_nxv4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #16 // =16
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b, i32 4)
				ret <vscale x 4 x float> %res
				}

				define <vscale x 4 x float> @splice_nxv4f32_1(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: splice_nxv4f32_1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #4 // =4
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b, i32 1)
				ret <vscale x 4 x float> %res
				}

				; Ensure number of trailing elements is clamped when we cannot prove it's less than VL.
				define <vscale x 4 x float> @splice_nxv4f32_clamped(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: splice_nxv4f32_clamped:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: rdvl x9, #1
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: mov w10, #20
				; CHECK-NEXT: cmp x9, #20 // =20
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: csel x9, x9, x10, lo
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, x9
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b, i32 5)
				ret <vscale x 4 x float> %res
				}

				define <vscale x 2 x double> @splice_nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: splice_nxv2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: st1d { z1.d }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #16 // =16
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 2)
				ret <vscale x 2 x double> %res
				}

				define <vscale x 2 x double> @splice_nxv2f64_1(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: splice_nxv2f64_1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: st1d { z1.d }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #8 // =8
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 1)
				ret <vscale x 2 x double> %res
				}

				; Ensure number of trailing elements is clamped when we cannot prove it's less than VL.
				define <vscale x 2 x double> @splice_nxv2f64_clamped(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: splice_nxv2f64_clamped:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: rdvl x9, #1
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: mov w10, #24
				; CHECK-NEXT: cmp x9, #24 // =24
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: st1d { z1.d }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: csel x9, x9, x10, lo
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, x9
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b, i32 3)
				ret <vscale x 2 x double> %res
				}

				; Ensure predicate based splice is promoted to use ZPRs.
				define <vscale x 2 x i1> @splice_nxv2i1(<vscale x 2 x i1> %a, <vscale x 2 x i1> %b) #0 {
				; CHECK-LABEL: splice_nxv2i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: mov z0.d, p0/z, #1 // =0x1
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov z1.d, p1/z, #1 // =0x1
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: st1d { z1.d }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #8 // =8
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8]
				; CHECK-NEXT: and z0.d, z0.d, #0x1
				; CHECK-NEXT: cmpne p0.d, p0/z, z0.d, #0
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i1> @llvm.experimental.vector.splice.nxv2i1(<vscale x 2 x i1> %a, <vscale x 2 x i1> %b, i32 1)
				ret <vscale x 2 x i1> %res
				}

				; Ensure predicate based splice is promoted to use ZPRs.
				define <vscale x 4 x i1> @splice_nxv4i1(<vscale x 4 x i1> %a, <vscale x 4 x i1> %b) #0 {
				; CHECK-LABEL: splice_nxv4i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: mov z0.s, p0/z, #1 // =0x1
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov z1.s, p1/z, #1 // =0x1
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #4 // =4
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: and z0.s, z0.s, #0x1
				; CHECK-NEXT: cmpne p0.s, p0/z, z0.s, #0
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i1> @llvm.experimental.vector.splice.nxv4i1(<vscale x 4 x i1> %a, <vscale x 4 x i1> %b, i32 1)
				ret <vscale x 4 x i1> %res
				}

				; Ensure predicate based splice is promoted to use ZPRs.
				define <vscale x 8 x i1> @splice_nxv8i1(<vscale x 8 x i1> %a, <vscale x 8 x i1> %b) #0 {
				; CHECK-LABEL: splice_nxv8i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: mov z0.h, p0/z, #1 // =0x1
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov z1.h, p1/z, #1 // =0x1
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: st1h { z1.h }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #2 // =2
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [x8]
				; CHECK-NEXT: and z0.h, z0.h, #0x1
				; CHECK-NEXT: cmpne p0.h, p0/z, z0.h, #0
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1> %a, <vscale x 8 x i1> %b, i32 1)
				ret <vscale x 8 x i1> %res
				}

				; Ensure predicate based splice is promoted to use ZPRs.
				define <vscale x 16 x i1> @splice_nxv16i1(<vscale x 16 x i1> %a, <vscale x 16 x i1> %b) #0 {
				; CHECK-LABEL: splice_nxv16i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: mov z0.b, p0/z, #1 // =0x1
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: mov z1.b, p1/z, #1 // =0x1
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1b { z0.b }, p0, [sp]
				; CHECK-NEXT: st1b { z1.b }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x8]
				; CHECK-NEXT: and z0.b, z0.b, #0x1
				; CHECK-NEXT: cmpne p0.b, p0/z, z0.b, #0
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 16 x i1> @llvm.experimental.vector.splice.nxv16i1(<vscale x 16 x i1> %a, <vscale x 16 x i1> %b, i32 1)
				ret <vscale x 16 x i1> %res
				}

				; Verify promote type legalisation works as expected.
				define <vscale x 2 x i8> @splice_nxv2i8(<vscale x 2 x i8> %a, <vscale x 2 x i8> %b) #0 {
				; CHECK-LABEL: splice_nxv2i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-2
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: st1d { z1.d }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: addvl x8, x8, #1
				; CHECK-NEXT: sub x8, x8, #16 // =16
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x8]
				; CHECK-NEXT: addvl sp, sp, #2
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i8> @llvm.experimental.vector.splice.nxv2i8(<vscale x 2 x i8> %a, <vscale x 2 x i8> %b, i32 2)
				ret <vscale x 2 x i8> %res
				}

				; Verify splitvec type legalisation works as expected.
				define <vscale x 8 x i32> @splice_nxv8i32(<vscale x 8 x i32> %a, <vscale x 8 x i32> %b) #0 {
				; CHECK-LABEL: splice_nxv8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-4
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: st1w { z3.s }, p0, [x8, #3, mul vl]
				; CHECK-NEXT: st1w { z2.s }, p0, [x8, #2, mul vl]
				; CHECK-NEXT: addvl x8, x8, #2
				; CHECK-NEXT: sub x8, x8, #32 // =32
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x8, #1, mul vl]
				; CHECK-NEXT: addvl sp, sp, #4
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32> %a, <vscale x 8 x i32> %b, i32 8)
				ret <vscale x 8 x i32> %res
				}

				; Verify splitvec type legalisation works as expected.
				define <vscale x 16 x float> @splice_nxv16f32_clamped(<vscale x 16 x float> %a, <vscale x 16 x float> %b) #0 {
				; CHECK-LABEL: splice_nxv16f32_clamped:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-8
				; CHECK-NEXT: rdvl x9, #4
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov x8, sp
				; CHECK-NEXT: mov w10, #68
				; CHECK-NEXT: cmp x9, #68 // =68
				; CHECK-NEXT: st1w { z3.s }, p0, [x8, #3, mul vl]
				; CHECK-NEXT: st1w { z2.s }, p0, [x8, #2, mul vl]
				; CHECK-NEXT: st1w { z1.s }, p0, [x8, #1, mul vl]
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: st1w { z7.s }, p0, [x8, #7, mul vl]
				; CHECK-NEXT: st1w { z4.s }, p0, [x8, #4, mul vl]
				; CHECK-NEXT: st1w { z5.s }, p0, [x8, #5, mul vl]
				; CHECK-NEXT: st1w { z6.s }, p0, [x8, #6, mul vl]
				; CHECK-NEXT: addvl x8, x8, #4
				; CHECK-NEXT: csel x9, x9, x10, lo
				; CHECK-NEXT: sub x8, x8, x9
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x8]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x8, #1, mul vl]
				; CHECK-NEXT: ld1w { z2.s }, p0/z, [x8, #2, mul vl]
				; CHECK-NEXT: ld1w { z3.s }, p0/z, [x8, #3, mul vl]
				; CHECK-NEXT: addvl sp, sp, #8
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = call <vscale x 16 x float> @llvm.experimental.vector.splice.nxv16f32(<vscale x 16 x float> %a, <vscale x 16 x float> %b, i32 17)
				ret <vscale x 16 x float> %res
				}

				declare <vscale x 2 x i1> @llvm.experimental.vector.splice.nxv2i1(<vscale x 2 x i1>, <vscale x 2 x i1>, i32)
				declare <vscale x 4 x i1> @llvm.experimental.vector.splice.nxv4i1(<vscale x 4 x i1>, <vscale x 4 x i1>, i32)
				declare <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1>, <vscale x 8 x i1>, i32)
				declare <vscale x 16 x i1> @llvm.experimental.vector.splice.nxv16i1(<vscale x 16 x i1>, <vscale x 16 x i1>, i32)
				declare <vscale x 2 x i8> @llvm.experimental.vector.splice.nxv2i8(<vscale x 2 x i8>, <vscale x 2 x i8>, i32)
				declare <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>, i32)
				declare <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>, i32)
				declare <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>, i32)
				declare <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32>, <vscale x 8 x i32>, i32)
				declare <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>, i32)
				declare <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32)
				declare <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>, i32)
				declare <vscale x 16 x float> @llvm.experimental.vector.splice.nxv16f32(<vscale x 16 x float>, <vscale x 16 x float>, i32)
				declare <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>, i32)

				attributes #0 = { nounwind "target-features"="+sve" }

This is an archive of the discontinued LLVM Phabricator instance.

[IR] Introduce llvm.experimental.vector.splice intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 316728

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/Target/TargetSelectionDAG.td

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

llvm/lib/CodeGen/TargetLoweringBase.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/named-vector-shuffles-neon.ll

llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll

[IR] Introduce llvm.experimental.vector.splice intrinsic
ClosedPublic