This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
2/7
LangRef.rst
-
include/llvm/IR/
-
llvm/
-
IR/
1/1
Intrinsics.td
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
2
SelectionDAGBuilder.cpp
-
IR/
-
Verifier.cpp
-
Transforms/InstCombine/
-
InstCombine/
-
InstCombineCalls.cpp
-
test/
-
CodeGen/AArch64/
-
AArch64/
-
sve-extract-vector.ll
-
sve-insert-vector.ll
-
Transforms/InstCombine/
-
InstCombine/
-
canonicalize-vector-extract.ll
-
canonicalize-vector-insert.ll
-
Verifier/
-
extract-vector-mismatched-element-types.ll
-
insert-vector-mismatched-element-types.ll

Differential D91362

[SelectionDAG] Add llvm.vector.{extract,insert} intrinsics
ClosedPublic

Authored by joechrisellis on Nov 12 2020, 8:49 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
peterwaller-arm
fpetrogalli
DavidTruby
cameron.mcinally

Commits

rG80c33de2d3c5: [SelectionDAG] Add llvm.vector.{extract,insert} intrinsics

Summary

This commit adds two new intrinsics.

llvm.vector.insert: used to insert a vector into another vector starting at a given index.

llvm.vector.extract: used to extract a subvector from a larger vector starting from a given index.

The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

joechrisellis created this revision.Nov 12 2020, 8:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2020, 8:49 AM

Herald added subscribers: llvm-commits, ecnelises, jdoerfert, hiraditya. · View Herald Transcript

joechrisellis requested review of this revision.Nov 12 2020, 8:49 AM

sdesmalen added a reviewer: cameron.mcinally.Nov 12 2020, 8:52 AM

sdesmalen added a subscriber: sdesmalen.

Was there an RFC?
Why should there be two ways to do the same thing?

Presumably you're planning to add some tests?

llvm/docs/LangRef.rst
16147	The meaning of `idx` is more nuanced that this as shown by the description attached to the ISD node it is equivalent to. I think this is worth calling out in the langref.
llvm/include/llvm/IR/Intrinsics.td
1620	For both intrinsics you'll need to add ImmArg properties for the index parameter. This is a requirement of the ISD node plus we don't want to open up the possibility of variable insertions.

I would expect the intrinsic to canonicalize to shufflevector with the appropriate shuffle mask for the case where both the vectors are fixed-width?

llvm/docs/LangRef.rst
16108	Can you add/change the example for inserting a fixed-width vector into a scalable vector. That is one of the main reasons for adding this intrinsic.

@lebedev.ri: shufflevector only has minimal support for scalable vectors with only the splat case covered (and even that has its quirks). With the recent change to force the mask to be an ArrayRef there is no way to represent arbitrary shuffles and at the same time the implementation forced a requirement that scalable vector data inputs imply a scalable vector result.

Harbormaster completed remote builds in B78632: Diff 304855.Nov 12 2020, 9:25 AM

In D91362#2391668, @paulwalker-arm wrote:

@lebedev.ri: shufflevector only has minimal support for scalable vectors with only the splat case covered (and even that has its quirks). With the recent change to force the mask to be an ArrayRef there is no way to represent arbitrary shuffles and at the same time the implementation forced a requirement that scalable vector data inputs imply a scalable vector result.

I still want to see all this explanation be put both into an RFC and into patch's description.

cameron.mcinally added inline comments.Nov 12 2020, 10:17 AM

llvm/docs/LangRef.rst
16108	This caught me off guard. What are the semantics of: declare <4 x float> @llvm.vector.insert.v4f32(<4 x float> %subvec, <4 x float> %vec, i64 %idx) Is that a full vector copy? I would have expected the subvector VL to be strictly less than the destination VL. E.g.: declare <4 x float> @llvm.vector.insert.v2f32.v4f32(<2 x float> %subvec, <4 x float> %vec, i64 %idx)

fhahn added a subscriber: fhahn.Nov 12 2020, 10:18 AM

Thanks @lebedev.ri, I incorrectly assumed exposing existing functionality wouldn't require an RFC. I'm putting together an RFC to cover support for other scalable vector shuffles so I'll include these within that.

Address @sdesmalen and @paulwalker-arm review comments.

Add tests.
Use ImmArg for idx parameter.
Include example of inserting a fixed-width vector into a scalable vector in LangRef.rst.
Use more precise description of idx in LangRef.rst.

Harbormaster completed remote builds in B78781: Diff 305190.Nov 13 2020, 10:04 AM

peterwaller-arm added inline comments.Nov 16 2020, 2:30 AM

llvm/docs/LangRef.rst
16103	Other intrinsics using this phrase also indicate in a following sentence what types are supported, I think we should do that here.
16113	The term "subvector" does not appear to be defined anywhere in the langref that I can see. I wonder if the term could be introduced along with scalable vectors in the "Vector Type" section. Alternatively there may be a way to write this sentence without using the jargon "subvector". For example: "The 'llvm.vector.insert.*' intrinsics insert a non-scalable vector into a scalable vector at the given index. Conceptually, this can be used to build a scalable vector out of non-scalable vectors.".
16140	Same comment here for a sentence describing valid types.
16150	Same comment here for the subvector language. Suggested language: "The 'llvm.vector.extract.*' intrinsics extract a non-scalable vector from a scalable vector at the given index. Conceptually, this can be used to decompose a scalable vector into non-scalable parts."

simoll added a subscriber: simoll.Nov 16 2020, 2:37 AM

Address @peterwaller-arm's comments.
Fix off-by-one error in Intrinsics.td.

Harbormaster completed remote builds in B79072: Diff 305703.Nov 17 2020, 2:43 AM

paulwalker-arm added inline comments.Nov 17 2020, 10:40 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6938–6939	These look reversed to me. The operand order should match that used by the ISD node.
6952–6955	The result type for both extract and insert should be based on the intrinsic's return type. Something like `EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType())`

Address @paulwalker-arm's comments.

Reverse operand order of insert intrinsic.
Base result type on the return types of the intrinsics.

Auto-generate tests.

Harbormaster completed remote builds in B79795: Diff 307077.Nov 23 2020, 8:20 AM

Harbormaster completed remote builds in B79800: Diff 307082.Nov 23 2020, 8:31 AM

Fix failing tests (forgot to write stderr to %t).

Harbormaster completed remote builds in B79809: Diff 307099.Nov 23 2020, 9:30 AM

Planning to add canonicalization to shufflevector for fixed-width vectors.

@lebedev.ri, hi! We submitted an RFC for named shuffle intrinsics to the llvm-dev mailing list (here). Do you think this is sufficient? Note there are still a few changes left to do on this patch (mainly moving the intrinsics to the experimental namespace and adding canonicalisation to shufflevector for the fixed case). 😄

Implement canonicalisation for the fixed case.
Move intrinsic into llvm.experimental namespace.
Update documentation and tests accordingly.

Harbormaster completed remote builds in B80238: Diff 307867.Nov 26 2020, 8:25 AM

Do we need to protect against mismatched element types? Or does legalization handle those exts/truncs?

%retval = call <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16(<vscale x 8 x i16> %vec, <8 x i8> %subvec, i64 0)

Address @cameron.mcinally's comment regarding protecting against mismatched element types.

Herald added a subscriber: dexonsmith. · View Herald TranscriptDec 1 2020, 10:41 AM

In D91362#2425780, @cameron.mcinally wrote:
Do we need to protect against mismatched element types? Or does legalization handle those exts/truncs?
%retval = call <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16(<vscale x 8 x i16> %vec, <8 x i8> %subvec, i64 0)

Good idea -- added some verifier checks for this.

Harbormaster completed remote builds in B80694: Diff 308710.Dec 1 2020, 11:34 AM

LGTM

I think @ctetreau's "first class citizen" argument on the RFC has merit though. But this patch is a good first step if we're not ready to extend ShuffleVector yet. I personally would like to see ShuffleVector extended eventually, since it would be easier to optimize.

@ctetreau's RFC comment can be found here:

http://lists.llvm.org/pipermail/llvm-dev/2020-December/146981.html

This revision is now accepted and ready to land.Dec 4 2020, 9:54 AM

Rebase atop main.

joechrisellis added a child revision: D92761: [clang][AArch64][SVE] Avoid going through memory for VLAT <-> VLST casts.Dec 7 2020, 6:37 AM

Harbormaster completed remote builds in B81288: Diff 309899.Dec 7 2020, 7:11 AM

Fixup tests.

Harbormaster completed remote builds in B81457: Diff 310194.Dec 8 2020, 8:10 AM

This revision was landed with ongoing or failed builds.Dec 9 2020, 3:09 AM

Closed by commit rG80c33de2d3c5: [SelectionDAG] Add llvm.vector.{extract,insert} intrinsics (authored by joechrisellis). · Explain Why

This revision was automatically updated to reflect the committed changes.

joechrisellis added a commit: rG80c33de2d3c5: [SelectionDAG] Add llvm.vector.{extract,insert} intrinsics.

dongAxis1944 added a subscriber: dongAxis1944.Jan 18 2021, 7:02 PM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

75 lines

include/

llvm/

IR/

Intrinsics.td

9 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

21 lines

IR/

Verifier.cpp

20 lines

Transforms/

InstCombine/

InstCombineCalls.cpp

96 lines

test/

CodeGen/

AArch64/

sve-extract-vector.ll

138 lines

sve-insert-vector.ll

184 lines

Transforms/

InstCombine/

canonicalize-vector-extract.ll

139 lines

canonicalize-vector-insert.ll

147 lines

Verifier/

extract-vector-mismatched-element-types.ll

9 lines

insert-vector-mismatched-element-types.ll

9 lines

Diff 310478

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 16,089 Lines • ▼ Show 20 Lines

	If the intrinsic call has the ``nnan`` fast-math flag, then the operation can			If the intrinsic call has the ``nnan`` fast-math flag, then the operation can
	assume that NaNs are not present in the input vector.			assume that NaNs are not present in the input vector.

	Arguments:			Arguments:
	""""""""""			""""""""""
	The argument to this intrinsic must be a vector of floating-point values.			The argument to this intrinsic must be a vector of floating-point values.

				'``llvm.experimental.vector.insert``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""
				This is an overloaded intrinsic. You can use ``llvm.experimental.vector.insert``
				peterwaller-armUnsubmitted Not Done Reply Inline Actions Other intrinsics using this phrase also indicate in a following sentence what types are supported, I think we should do that here. peterwaller-arm: Other intrinsics using this phrase also indicate in a following sentence what types are…
				to insert a fixed-width vector into a scalable vector, but not the other way
				around.

				::

				sdesmalenUnsubmitted Done Reply Inline Actions Can you add/change the example for inserting a fixed-width vector into a scalable vector. That is one of the main reasons for adding this intrinsic. sdesmalen: Can you add/change the example for inserting a fixed-width vector into a scalable vector. That…
				cameron.mcinallyUnsubmitted Not Done Reply Inline Actions This caught me off guard. What are the semantics of: declare <4 x float> @llvm.vector.insert.v4f32(<4 x float> %subvec, <4 x float> %vec, i64 %idx) Is that a full vector copy? I would have expected the subvector VL to be strictly less than the destination VL. E.g.: declare <4 x float> @llvm.vector.insert.v2f32.v4f32(<2 x float> %subvec, <4 x float> %vec, i64 %idx) cameron.mcinally: This caught me off guard. What are the semantics of: ``` declare <4 x float> @llvm.vector.
				declare <vscale x 4 x float> @llvm.experimental.vector.insert.v4f32(<vscale x 4 x float> %vec, <4 x float> %subvec, i64 %idx)
				declare <vscale x 2 x double> @llvm.experimental.vector.insert.v2f64(<vscale x 2 x double> %vec, <2 x double> %subvec, i64 %idx)

				Overview:
				"""""""""
				peterwaller-armUnsubmitted Not Done Reply Inline Actions The term "subvector" does not appear to be defined anywhere in the langref that I can see. I wonder if the term could be introduced along with scalable vectors in the "Vector Type" section. Alternatively there may be a way to write this sentence without using the jargon "subvector". For example: "The 'llvm.vector.insert.' intrinsics insert a non-scalable vector into a scalable vector at the given index. Conceptually, this can be used to build a scalable vector out of non-scalable vectors.". peterwaller-arm:* The term "subvector" does not appear to be defined anywhere in the langref that I can see. I…

				The '``llvm.experimental.vector.insert.*``' intrinsics insert a vector into another vector
				starting from a given index. The return type matches the type of the vector we
				insert into. Conceptually, this can be used to build a scalable vector out of
				non-scalable vectors.

				Arguments:
				""""""""""

				The ``vec`` is the vector which ``subvec`` will be inserted into.
				The ``subvec`` is the vector that will be inserted.

				``idx`` represents the starting element number at which ``subvec`` will be
				inserted. ``idx`` must be a constant multiple of ``subvec``'s known minimum
				vector length. If ``subvec`` is a scalable vector, ``idx`` is first scaled by
				the runtime scaling factor of ``subvec``. The elements of ``vec`` starting at
				``idx`` are overwritten with ``subvec``. Elements ``idx`` through (``idx`` +
				num_elements(``subvec``) - 1) must be valid ``vec`` indices. If this condition
				cannot be determined statically but is false at runtime, then the result vector
				is undefined.


				'``llvm.experimental.vector.extract``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""
				peterwaller-armUnsubmitted Not Done Reply Inline Actions Same comment here for a sentence describing valid types. peterwaller-arm: Same comment here for a sentence describing valid types.
				This is an overloaded intrinsic. You can use
				``llvm.experimental.vector.extract`` to extract a fixed-width vector from a
				scalable vector, but not the other way around.

				::

				declare <4 x float> @llvm.experimental.vector.extract.v4f32(<vscale x 4 x float> %vec, i64 %idx)
				paulwalker-armUnsubmitted Done Reply Inline Actions The meaning of `idx` is more nuanced that this as shown by the description attached to the ISD node it is equivalent to. I think this is worth calling out in the langref. paulwalker-arm: The meaning of `idx` is more nuanced that this as shown by the description attached to the ISD…
				declare <2 x double> @llvm.experimental.vector.extract.v2f64(<vscale x 2 x double> %vec, i64 %idx)

				Overview:
				peterwaller-armUnsubmitted Not Done Reply Inline Actions Same comment here for the subvector language. Suggested language: "The 'llvm.vector.extract.' intrinsics extract a non-scalable vector from a scalable vector at the given index. Conceptually, this can be used to decompose a scalable vector into non-scalable parts." peterwaller-arm:* Same comment here for the subvector language. Suggested language: "The 'llvm.vector.extract.
				"""""""""

				The '``llvm.experimental.vector.extract.*``' intrinsics extract a vector from
				within another vector starting from a given index. The return type must be
				explicitly specified. Conceptually, this can be used to decompose a scalable
				vector into non-scalable parts.

				Arguments:
				""""""""""

				The ``vec`` is the vector from which we will extract a subvector.

				The ``idx`` specifies the starting element number within ``vec`` from which a
				subvector is extracted. ``idx`` must be a constant multiple of the known-minimum
				vector length of the result type. If the result type is a scalable vector,
				``idx`` is first scaled by the result type's runtime scaling factor. Elements
				``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector
				indices. If this condition cannot be determined statically but is false at
				runtime, then the result vector is undefined. The ``idx`` parameter must be a
				vector index constant type (for most targets this will be an integer pointer
				type).

	Matrix Intrinsics			Matrix Intrinsics
	-----------------			-----------------

	Operations on matrixes requiring shape information (like number of rows/columns			Operations on matrixes requiring shape information (like number of rows/columns
	or the memory layout) can be expressed using the matrix intrinsics. These			or the memory layout) can be expressed using the matrix intrinsics. These
	intrinsics require matrix dimensions to be passed as immediate arguments, and			intrinsics require matrix dimensions to be passed as immediate arguments, and
	matrixes are passed and returned as vectors. This means that for a ``R`` x			matrixes are passed and returned as vectors. This means that for a ``R`` x
	``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the			``C`` matrix, element ``i`` of column ``j`` is at index ``j * R + i`` in the
	▲ Show 20 Lines • Show All 4,939 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 1,608 Lines • ▼ Show 20 Lines	def int_preserve_struct_access_index : DefaultAttrsIntrinsic<[llvm_anyptr_ty],
llvm_i32_ty],		llvm_i32_ty],
[IntrNoMem,		[IntrNoMem,
ImmArg<ArgIndex<1>>,		ImmArg<ArgIndex<1>>,
ImmArg<ArgIndex<2>>]>;		ImmArg<ArgIndex<2>>]>;

//===---------- Intrinsics to query properties of scalable vectors --------===//		//===---------- Intrinsics to query properties of scalable vectors --------===//
def int_vscale : DefaultAttrsIntrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;		def int_vscale : DefaultAttrsIntrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;

		//===---------- Intrinsics to perform subvector insertion/extraction ------===//
		def int_experimental_vector_insert : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
		[LLVMMatchType<0>, llvm_anyvector_ty, llvm_i64_ty],
		[IntrNoMem, ImmArg<ArgIndex<2>>]>;
		paulwalker-armUnsubmitted Done Reply Inline Actions For both intrinsics you'll need to add ImmArg properties for the index parameter. This is a requirement of the ISD node plus we don't want to open up the possibility of variable insertions. paulwalker-arm: For both intrinsics you'll need to add ImmArg properties for the index parameter. This is a…

		def int_experimental_vector_extract : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
		[llvm_anyvector_ty, llvm_i64_ty],
		[IntrNoMem, ImmArg<ArgIndex<1>>]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Target-specific intrinsics		// Target-specific intrinsics
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "llvm/IR/IntrinsicsPowerPC.td"		include "llvm/IR/IntrinsicsPowerPC.td"
include "llvm/IR/IntrinsicsX86.td"		include "llvm/IR/IntrinsicsX86.td"
Show All 12 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,926 Lines • ▼ Show 20 Lines	case Intrinsic::get_active_lane_mask: {
SDValue VectorTripCount = DAG.getBuildVector(VecTy, DL, OpsTripCount);		SDValue VectorTripCount = DAG.getBuildVector(VecTy, DL, OpsTripCount);
SDValue SetCC = DAG.getSetCC(DL, CCVT, VectorInduction.getValue(0),		SDValue SetCC = DAG.getSetCC(DL, CCVT, VectorInduction.getValue(0),
VectorTripCount, ISD::CondCode::SETULT);		VectorTripCount, ISD::CondCode::SETULT);
setValue(&I, DAG.getNode(ISD::AND, DL, CCVT,		setValue(&I, DAG.getNode(ISD::AND, DL, CCVT,
DAG.getNOT(DL, VectorInduction.getValue(1), CCVT),		DAG.getNOT(DL, VectorInduction.getValue(1), CCVT),
SetCC));		SetCC));
return;		return;
}		}
		case Intrinsic::experimental_vector_insert: {
		auto DL = getCurSDLoc();

		SDValue Vec = getValue(I.getOperand(0));
		SDValue SubVec = getValue(I.getOperand(1));
		paulwalker-armUnsubmitted Not Done Reply Inline Actions These look reversed to me. The operand order should match that used by the ISD node. paulwalker-arm: These look reversed to me. The operand order should match that used by the ISD node.
		SDValue Index = getValue(I.getOperand(2));
		EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
		setValue(&I, DAG.getNode(ISD::INSERT_SUBVECTOR, DL, ResultVT, Vec, SubVec,
		Index));
		return;
		}
		case Intrinsic::experimental_vector_extract: {
		auto DL = getCurSDLoc();

		SDValue Vec = getValue(I.getOperand(0));
		SDValue Index = getValue(I.getOperand(1));
		EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), I.getType());

		setValue(&I, DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResultVT, Vec, Index));
		return;
		}
		paulwalker-armUnsubmitted Not Done Reply Inline Actions The result type for both extract and insert should be based on the intrinsic's return type. Something like `EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType())` paulwalker-arm: The result type for both extract and insert should be based on the intrinsic's return type.
}		}
}		}

void SelectionDAGBuilder::visitConstrainedFPIntrinsic(		void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
const ConstrainedFPIntrinsic &FPI) {		const ConstrainedFPIntrinsic &FPI) {
SDLoc sdl = getCurSDLoc();		SDLoc sdl = getCurSDLoc();

const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
▲ Show 20 Lines • Show All 3,824 Lines • Show Last 20 Lines

llvm/lib/IR/Verifier.cpp

Show First 20 Lines • Show All 5,132 Lines • ▼ Show 20 Lines	Assert(cast<FixedVectorType>(ResultTy)->getNumElements() ==
"Result of a matrix operation does not fit in the returned vector!");		"Result of a matrix operation does not fit in the returned vector!");

if (Stride)		if (Stride)
Assert(Stride->getZExtValue() >= NumRows->getZExtValue(),		Assert(Stride->getZExtValue() >= NumRows->getZExtValue(),
"Stride must be greater or equal than the number of rows!", IF);		"Stride must be greater or equal than the number of rows!", IF);

break;		break;
}		}
		case Intrinsic::experimental_vector_insert: {
		VectorType *VecTy = cast<VectorType>(Call.getArgOperand(0)->getType());
		VectorType *SubVecTy = cast<VectorType>(Call.getArgOperand(1)->getType());

		Assert(VecTy->getElementType() == SubVecTy->getElementType(),
		"experimental_vector_insert parameters must have the same element "
		"type.",
		&Call);
		break;
		}
		case Intrinsic::experimental_vector_extract: {
		VectorType *ResultTy = cast<VectorType>(Call.getType());
		VectorType *VecTy = cast<VectorType>(Call.getArgOperand(0)->getType());

		Assert(ResultTy->getElementType() == VecTy->getElementType(),
		"experimental_vector_extract result must have the same element "
		"type as the input vector.",
		&Call);
		break;
		}
};		};
}		}

/// Carefully grab the subprogram from a local scope.		/// Carefully grab the subprogram from a local scope.
///		///
/// This carefully grabs the subprogram from a local scope, avoiding the		/// This carefully grabs the subprogram from a local scope, avoiding the
/// built-in assertions that would typically fire.		/// built-in assertions that would typically fire.
static DISubprogram getSubprogram(Metadata LocalScope) {		static DISubprogram getSubprogram(Metadata LocalScope) {
▲ Show 20 Lines • Show All 805 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 1,646 Lines • ▼ Show 20 Lines	if (match(NextInst,
}		}
replaceOperand(*II, 0, Builder.CreateAnd(CurrCond, NextCond));		replaceOperand(*II, 0, Builder.CreateAnd(CurrCond, NextCond));
}		}
eraseInstFromFunction(*NextInst);		eraseInstFromFunction(*NextInst);
return II;		return II;
}		}
break;		break;
}		}
		case Intrinsic::experimental_vector_insert: {
		Value *Vec = II->getArgOperand(0);
		Value *SubVec = II->getArgOperand(1);
		Value *Idx = II->getArgOperand(2);
		auto *DstTy = dyn_cast<FixedVectorType>(II->getType());
		auto *VecTy = dyn_cast<FixedVectorType>(Vec->getType());
		auto *SubVecTy = dyn_cast<FixedVectorType>(SubVec->getType());

		// Only canonicalize if the destination vector, Vec, and SubVec are all
		// fixed vectors.
		if (DstTy && VecTy && SubVecTy) {
		unsigned DstNumElts = DstTy->getNumElements();
		unsigned VecNumElts = VecTy->getNumElements();
		unsigned SubVecNumElts = SubVecTy->getNumElements();
		unsigned IdxN = cast<ConstantInt>(Idx)->getZExtValue();

		// The result of this call is undefined if IdxN is not a constant multiple
		// of the SubVec's minimum vector length OR the insertion overruns Vec.
		if (IdxN % SubVecNumElts != 0 \|\| IdxN + SubVecNumElts > VecNumElts) {
		replaceInstUsesWith(CI, UndefValue::get(CI.getType()));
		return eraseInstFromFunction(CI);
		}

		// An insert that entirely overwrites Vec with SubVec is a nop.
		if (VecNumElts == SubVecNumElts) {
		replaceInstUsesWith(CI, SubVec);
		return eraseInstFromFunction(CI);
		}

		// Widen SubVec into a vector of the same width as Vec, since
		// shufflevector requires the two input vectors to be the same width.
		// Elements beyond the bounds of SubVec within the widened vector are
		// undefined.
		SmallVector<int, 8> WidenMask;
		unsigned i;
		for (i = 0; i != SubVecNumElts; ++i)
		WidenMask.push_back(i);
		for (; i != VecNumElts; ++i)
		WidenMask.push_back(UndefMaskElem);

		Value *WidenShuffle = Builder.CreateShuffleVector(
		SubVec, llvm::UndefValue::get(SubVecTy), WidenMask);

		SmallVector<int, 8> Mask;
		for (unsigned i = 0; i != IdxN; ++i)
		Mask.push_back(i);
		for (unsigned i = DstNumElts; i != DstNumElts + SubVecNumElts; ++i)
		Mask.push_back(i);
		for (unsigned i = IdxN + SubVecNumElts; i != DstNumElts; ++i)
		Mask.push_back(i);

		Value *Shuffle = Builder.CreateShuffleVector(Vec, WidenShuffle, Mask);
		replaceInstUsesWith(CI, Shuffle);
		return eraseInstFromFunction(CI);
		}
		break;
		}
		case Intrinsic::experimental_vector_extract: {
		Value *Vec = II->getArgOperand(0);
		Value *Idx = II->getArgOperand(1);

		auto *DstTy = dyn_cast<FixedVectorType>(II->getType());
		auto *VecTy = dyn_cast<FixedVectorType>(Vec->getType());

		// Only canonicalize if the the destination vector and Vec are fixed
		// vectors.
		if (DstTy && VecTy) {
		unsigned DstNumElts = DstTy->getNumElements();
		unsigned VecNumElts = VecTy->getNumElements();
		unsigned IdxN = cast<ConstantInt>(Idx)->getZExtValue();

		// The result of this call is undefined if IdxN is not a constant multiple
		// of the result type's minimum vector length OR the extraction overruns
		// Vec.
		if (IdxN % DstNumElts != 0 \|\| IdxN + DstNumElts > VecNumElts) {
		replaceInstUsesWith(CI, UndefValue::get(CI.getType()));
		return eraseInstFromFunction(CI);
		}

		// Extracting the entirety of Vec is a nop.
		if (VecNumElts == DstNumElts) {
		replaceInstUsesWith(CI, Vec);
		return eraseInstFromFunction(CI);
		}

		SmallVector<int, 8> Mask;
		for (unsigned i = 0; i != DstNumElts; ++i)
		Mask.push_back(IdxN + i);

		Value *Shuffle =
		Builder.CreateShuffleVector(Vec, UndefValue::get(VecTy), Mask);
		replaceInstUsesWith(CI, Shuffle);
		return eraseInstFromFunction(CI);
		}
		break;
		}
default: {		default: {
// Handle target specific intrinsics		// Handle target specific intrinsics
Optional<Instruction > V = targetInstCombineIntrinsic(II);		Optional<Instruction > V = targetInstCombineIntrinsic(II);
if (V.hasValue())		if (V.hasValue())
return V.getValue();		return V.getValue();
break;		break;
}		}
}		}
▲ Show 20 Lines • Show All 801 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-extract-vector.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t \| FileCheck %s --check-prefixes=CHECK
				; RUN: FileCheck --check-prefix=WARN --allow-empty %s < %t

				; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
				; WARN-NOT: warning

				; Should codegen to a nop, since idx is zero.
				define <2 x i64> @extract_v2i64_nxv2i64(<vscale x 2 x i64> %vec) nounwind {
				; CHECK-LABEL: extract_v2i64_nxv2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%retval = call <2 x i64> @llvm.experimental.vector.extract.v2i64.nxv2i64(<vscale x 2 x i64> %vec, i64 0)
				ret <2 x i64> %retval
				}

				; Goes through memory currently; idx != 0.
				define <2 x i64> @extract_v2i64_nxv2i64_idx1(<vscale x 2 x i64> %vec) nounwind {
				; CHECK-LABEL: extract_v2i64_nxv2i64_idx1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: cntd x8
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: cmp x8, #1 // =1
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: csinc x8, x8, xzr, lo
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: lsl x8, x8, #3
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: ldr q0, [x9, x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%retval = call <2 x i64> @llvm.experimental.vector.extract.v2i64.nxv2i64(<vscale x 2 x i64> %vec, i64 1)
				ret <2 x i64> %retval
				}

				; Should codegen to a nop, since idx is zero.
				define <4 x i32> @extract_v4i32_nxv4i32(<vscale x 4 x i32> %vec) nounwind {
				; CHECK-LABEL: extract_v4i32_nxv4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%retval = call <4 x i32> @llvm.experimental.vector.extract.v4i32.nxv4i32(<vscale x 4 x i32> %vec, i64 0)
				ret <4 x i32> %retval
				}

				; Goes through memory currently; idx != 0.
				define <4 x i32> @extract_v4i32_nxv4i32_idx1(<vscale x 4 x i32> %vec) nounwind {
				; CHECK-LABEL: extract_v4i32_nxv4i32_idx1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: cntw x8
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: cmp x8, #1 // =1
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: csinc x8, x8, xzr, lo
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: lsl x8, x8, #2
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: ldr q0, [x9, x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%retval = call <4 x i32> @llvm.experimental.vector.extract.v4i32.nxv4i32(<vscale x 4 x i32> %vec, i64 1)
				ret <4 x i32> %retval
				}

				; Should codegen to a nop, since idx is zero.
				define <8 x i16> @extract_v8i16_nxv8i16(<vscale x 8 x i16> %vec) nounwind {
				; CHECK-LABEL: extract_v8i16_nxv8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%retval = call <8 x i16> @llvm.experimental.vector.extract.v8i16.nxv8i16(<vscale x 8 x i16> %vec, i64 0)
				ret <8 x i16> %retval
				}

				; Goes through memory currently; idx != 0.
				define <8 x i16> @extract_v8i16_nxv8i16_idx1(<vscale x 8 x i16> %vec) nounwind {
				; CHECK-LABEL: extract_v8i16_nxv8i16_idx1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: cnth x8
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: cmp x8, #1 // =1
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: csinc x8, x8, xzr, lo
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: lsl x8, x8, #1
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: ldr q0, [x9, x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%retval = call <8 x i16> @llvm.experimental.vector.extract.v8i16.nxv8i16(<vscale x 8 x i16> %vec, i64 1)
				ret <8 x i16> %retval
				}

				; Should codegen to a nop, since idx is zero.
				define <16 x i8> @extract_v16i8_nxv16i8(<vscale x 16 x i8> %vec) nounwind {
				; CHECK-LABEL: extract_v16i8_nxv16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%retval = call <16 x i8> @llvm.experimental.vector.extract.v16i8.nxv16i8(<vscale x 16 x i8> %vec, i64 0)
				ret <16 x i8> %retval
				}

				; Goes through memory currently; idx != 0.
				define <16 x i8> @extract_v16i8_nxv16i8_idx1(<vscale x 16 x i8> %vec) nounwind {
				; CHECK-LABEL: extract_v16i8_nxv16i8_idx1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: rdvl x8, #1
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: cmp x8, #1 // =1
				; CHECK-NEXT: st1b { z0.b }, p0, [sp]
				; CHECK-NEXT: csinc x8, x8, xzr, lo
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: ldr q0, [x9, x8]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%retval = call <16 x i8> @llvm.experimental.vector.extract.v16i8.nxv16i8(<vscale x 16 x i8> %vec, i64 1)
				ret <16 x i8> %retval
				}

				declare <2 x i64> @llvm.experimental.vector.extract.v2i64.nxv2i64(<vscale x 2 x i64>, i64)
				declare <4 x i32> @llvm.experimental.vector.extract.v4i32.nxv4i32(<vscale x 4 x i32>, i64)
				declare <8 x i16> @llvm.experimental.vector.extract.v8i16.nxv8i16(<vscale x 8 x i16>, i64)
				declare <16 x i8> @llvm.experimental.vector.extract.v16i8.nxv16i8(<vscale x 16 x i8>, i64)

llvm/test/CodeGen/AArch64/sve-insert-vector.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t \| FileCheck %s --check-prefixes=CHECK
				; RUN: FileCheck --check-prefix=WARN --allow-empty %s < %t
				; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
				; WARN-NOT: warning

				define <vscale x 2 x i64> @insert_v2i64_nxv2i64(<vscale x 2 x i64> %vec, <2 x i64> %subvec) nounwind {
				; CHECK-LABEL: insert_v2i64_nxv2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: cntd x8
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: cmp x8, #0 // =0
				; CHECK-NEXT: csel x8, x8, xzr, lo
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: lsl x8, x8, #3
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: str q1, [x9, x8]
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [sp]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%retval = call <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v2i64(<vscale x 2 x i64> %vec, <2 x i64> %subvec, i64 0)
				ret <vscale x 2 x i64> %retval
				}

				define <vscale x 2 x i64> @insert_v2i64_nxv2i64_idx1(<vscale x 2 x i64> %vec, <2 x i64> %subvec) nounwind {
				; CHECK-LABEL: insert_v2i64_nxv2i64_idx1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: cntd x8
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: cmp x8, #1 // =1
				; CHECK-NEXT: csinc x8, x8, xzr, lo
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: lsl x8, x8, #3
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: st1d { z0.d }, p0, [sp]
				; CHECK-NEXT: str q1, [x9, x8]
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [sp]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%retval = call <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v2i64(<vscale x 2 x i64> %vec, <2 x i64> %subvec, i64 1)
				ret <vscale x 2 x i64> %retval
				}

				define <vscale x 4 x i32> @insert_v4i32_nxv4i32(<vscale x 4 x i32> %vec, <4 x i32> %subvec) nounwind {
				; CHECK-LABEL: insert_v4i32_nxv4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: cntw x8
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: cmp x8, #0 // =0
				; CHECK-NEXT: csel x8, x8, xzr, lo
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: lsl x8, x8, #2
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: str q1, [x9, x8]
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [sp]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> %vec, <4 x i32> %subvec, i64 0)
				ret <vscale x 4 x i32> %retval
				}

				define <vscale x 4 x i32> @insert_v4i32_nxv4i32_idx1(<vscale x 4 x i32> %vec, <4 x i32> %subvec) nounwind {
				; CHECK-LABEL: insert_v4i32_nxv4i32_idx1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: cntw x8
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: cmp x8, #1 // =1
				; CHECK-NEXT: csinc x8, x8, xzr, lo
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: lsl x8, x8, #2
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: st1w { z0.s }, p0, [sp]
				; CHECK-NEXT: str q1, [x9, x8]
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [sp]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%retval = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> %vec, <4 x i32> %subvec, i64 1)
				ret <vscale x 4 x i32> %retval
				}

				define <vscale x 8 x i16> @insert_v8i16_nxv8i16(<vscale x 8 x i16> %vec, <8 x i16> %subvec) nounwind {
				; CHECK-LABEL: insert_v8i16_nxv8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: cnth x8
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: cmp x8, #0 // =0
				; CHECK-NEXT: csel x8, x8, xzr, lo
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: lsl x8, x8, #1
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: str q1, [x9, x8]
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [sp]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%retval = call <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.v8i16(<vscale x 8 x i16> %vec, <8 x i16> %subvec, i64 0)
				ret <vscale x 8 x i16> %retval
				}

				define <vscale x 8 x i16> @insert_v8i16_nxv8i16_idx1(<vscale x 8 x i16> %vec, <8 x i16> %subvec) nounwind {
				; CHECK-LABEL: insert_v8i16_nxv8i16_idx1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: cnth x8
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: cmp x8, #1 // =1
				; CHECK-NEXT: csinc x8, x8, xzr, lo
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: lsl x8, x8, #1
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: str q1, [x9, x8]
				; CHECK-NEXT: ld1h { z0.h }, p0/z, [sp]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%retval = call <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.v8i16(<vscale x 8 x i16> %vec, <8 x i16> %subvec, i64 1)
				ret <vscale x 8 x i16> %retval
				}

				define <vscale x 16 x i8> @insert_v16i8_nxv16i8(<vscale x 16 x i8> %vec, <16 x i8> %subvec) nounwind {
				; CHECK-LABEL: insert_v16i8_nxv16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: rdvl x8, #1
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: cmp x8, #0 // =0
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: csel x8, x8, xzr, lo
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: st1b { z0.b }, p0, [sp]
				; CHECK-NEXT: str q1, [x9, x8]
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [sp]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%retval = call <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv16i8.v16i8(<vscale x 16 x i8> %vec, <16 x i8> %subvec, i64 0)
				ret <vscale x 16 x i8> %retval
				}

				define <vscale x 16 x i8> @insert_v16i8_nxv16i8_idx1(<vscale x 16 x i8> %vec, <16 x i8> %subvec) nounwind {
				; CHECK-LABEL: insert_v16i8_nxv16i8_idx1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: rdvl x8, #1
				; CHECK-NEXT: sub x8, x8, #1 // =1
				; CHECK-NEXT: cmp x8, #1 // =1
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: csinc x8, x8, xzr, lo
				; CHECK-NEXT: mov x9, sp
				; CHECK-NEXT: st1b { z0.b }, p0, [sp]
				; CHECK-NEXT: str q1, [x9, x8]
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [sp]
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%retval = call <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv16i8.v16i8(<vscale x 16 x i8> %vec, <16 x i8> %subvec, i64 1)
				ret <vscale x 16 x i8> %retval
				}

				declare <vscale x 2 x i64> @llvm.experimental.vector.insert.nxv2i64.v2i64(<vscale x 2 x i64>, <2 x i64>, i64)
				declare <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32>, <4 x i32>, i64)
				declare <vscale x 8 x i16> @llvm.experimental.vector.insert.nxv8i16.v8i16(<vscale x 8 x i16>, <8 x i16>, i64)
				declare <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv16i8.v16i8(<vscale x 16 x i8>, <16 x i8>, i64)

llvm/test/Transforms/InstCombine/canonicalize-vector-extract.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -instcombine -S \| FileCheck %s

				; llvm.experimental.vector.extract canonicalizes to shufflevector in the fixed case. In the
				; scalable case, we lower to the EXTRACT_SUBVECTOR ISD node.

				declare <10 x i32> @llvm.experimental.vector.extract.v10i32.v8i32(<8 x i32> %vec, i64 %idx)
				declare <2 x i32> @llvm.experimental.vector.extract.v2i32.v4i32(<8 x i32> %vec, i64 %idx)
				declare <3 x i32> @llvm.experimental.vector.extract.v3i32.v8i32(<8 x i32> %vec, i64 %idx)
				declare <4 x i32> @llvm.experimental.vector.extract.v4i32.nxv4i32(<vscale x 4 x i32> %vec, i64 %idx)
				declare <4 x i32> @llvm.experimental.vector.extract.v4i32.v8i32(<8 x i32> %vec, i64 %idx)
				declare <8 x i32> @llvm.experimental.vector.extract.v8i32.v8i32(<8 x i32> %vec, i64 %idx)

				; ============================================================================ ;
				; Trivial cases
				; ============================================================================ ;

				; Extracting the entirety of a vector is a nop.
				define <8 x i32> @trivial_nop(<8 x i32> %vec) {
				; CHECK-LABEL: @trivial_nop(
				; CHECK-NEXT: ret <8 x i32> [[VEC:%.*]]
				;
				%1 = call <8 x i32> @llvm.experimental.vector.extract.v8i32.v8i32(<8 x i32> %vec, i64 0)
				ret <8 x i32> %1
				}

				; ============================================================================ ;
				; Valid canonicalizations
				; ============================================================================ ;

				define <2 x i32> @valid_extraction_a(<8 x i32> %vec) {
				; CHECK-LABEL: @valid_extraction_a(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> undef, <2 x i32> <i32 0, i32 1>
				; CHECK-NEXT: ret <2 x i32> [[TMP1]]
				;
				%1 = call <2 x i32> @llvm.experimental.vector.extract.v2i32.v4i32(<8 x i32> %vec, i64 0)
				ret <2 x i32> %1
				}

				define <2 x i32> @valid_extraction_b(<8 x i32> %vec) {
				; CHECK-LABEL: @valid_extraction_b(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> undef, <2 x i32> <i32 2, i32 3>
				; CHECK-NEXT: ret <2 x i32> [[TMP1]]
				;
				%1 = call <2 x i32> @llvm.experimental.vector.extract.v2i32.v4i32(<8 x i32> %vec, i64 2)
				ret <2 x i32> %1
				}

				define <2 x i32> @valid_extraction_c(<8 x i32> %vec) {
				; CHECK-LABEL: @valid_extraction_c(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> undef, <2 x i32> <i32 4, i32 5>
				; CHECK-NEXT: ret <2 x i32> [[TMP1]]
				;
				%1 = call <2 x i32> @llvm.experimental.vector.extract.v2i32.v4i32(<8 x i32> %vec, i64 4)
				ret <2 x i32> %1
				}

				define <2 x i32> @valid_extraction_d(<8 x i32> %vec) {
				; CHECK-LABEL: @valid_extraction_d(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> undef, <2 x i32> <i32 6, i32 7>
				; CHECK-NEXT: ret <2 x i32> [[TMP1]]
				;
				%1 = call <2 x i32> @llvm.experimental.vector.extract.v2i32.v4i32(<8 x i32> %vec, i64 6)
				ret <2 x i32> %1
				}

				define <4 x i32> @valid_extraction_e(<8 x i32> %vec) {
				; CHECK-LABEL: @valid_extraction_e(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				; CHECK-NEXT: ret <4 x i32> [[TMP1]]
				;
				%1 = call <4 x i32> @llvm.experimental.vector.extract.v4i32.v8i32(<8 x i32> %vec, i64 0)
				ret <4 x i32> %1
				}

				define <4 x i32> @valid_extraction_f(<8 x i32> %vec) {
				; CHECK-LABEL: @valid_extraction_f(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				; CHECK-NEXT: ret <4 x i32> [[TMP1]]
				;
				%1 = call <4 x i32> @llvm.experimental.vector.extract.v4i32.v8i32(<8 x i32> %vec, i64 4)
				ret <4 x i32> %1
				}

				define <3 x i32> @valid_extraction_g(<8 x i32> %vec) {
				; CHECK-LABEL: @valid_extraction_g(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> undef, <3 x i32> <i32 0, i32 1, i32 2>
				; CHECK-NEXT: ret <3 x i32> [[TMP1]]
				;
				%1 = call <3 x i32> @llvm.experimental.vector.extract.v3i32.v8i32(<8 x i32> %vec, i64 0)
				ret <3 x i32> %1
				}

				define <3 x i32> @valid_extraction_h(<8 x i32> %vec) {
				; CHECK-LABEL: @valid_extraction_h(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> undef, <3 x i32> <i32 3, i32 4, i32 5>
				; CHECK-NEXT: ret <3 x i32> [[TMP1]]
				;
				%1 = call <3 x i32> @llvm.experimental.vector.extract.v3i32.v8i32(<8 x i32> %vec, i64 3)
				ret <3 x i32> %1
				}

				; ============================================================================ ;
				; Invalid canonicalizations
				; ============================================================================ ;

				; Idx must be the be a constant multiple of the destination vector's length,
				; otherwise the result is undefined.
				define <4 x i32> @idx_not_constant_multiple(<8 x i32> %vec) {
				; CHECK-LABEL: @idx_not_constant_multiple(
				; CHECK-NEXT: ret <4 x i32> undef
				;
				%1 = call <4 x i32> @llvm.experimental.vector.extract.v4i32.v8i32(<8 x i32> %vec, i64 1)
				ret <4 x i32> %1
				}

				; If the extraction overruns the vector, the result is undefined.
				define <10 x i32> @extract_overrun(<8 x i32> %vec) {
				; CHECK-LABEL: @extract_overrun(
				; CHECK-NEXT: ret <10 x i32> undef
				;
				%1 = call <10 x i32> @llvm.experimental.vector.extract.v10i32.v8i32(<8 x i32> %vec, i64 0)
				ret <10 x i32> %1
				}

				; ============================================================================ ;
				; Scalable cases
				; ============================================================================ ;

				; Scalable extractions should not be canonicalized. This will be lowered to the
				; EXTRACT_SUBVECTOR ISD node later.
				define <4 x i32> @scalable_extract(<vscale x 4 x i32> %vec) {
				; CHECK-LABEL: @scalable_extract(
				; CHECK-NEXT: [[TMP1:%.]] = call <4 x i32> @llvm.experimental.vector.extract.v4i32.nxv4i32(<vscale x 4 x i32> [[VEC:%.]], i64 0)
				; CHECK-NEXT: ret <4 x i32> [[TMP1]]
				;
				%1 = call <4 x i32> @llvm.experimental.vector.extract.v4i32.nxv4i32(<vscale x 4 x i32> %vec, i64 0)
				ret <4 x i32> %1
				}

llvm/test/Transforms/InstCombine/canonicalize-vector-insert.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -instcombine -S \| FileCheck %s

				; llvm.experimental.vector.insert canonicalizes to shufflevector in the fixed case. In the
				; scalable case, we lower to the INSERT_SUBVECTOR ISD node.

				declare <8 x i32> @llvm.experimental.vector.insert.v8i32.v2i32(<8 x i32> %vec, <2 x i32> %subvec, i64 %idx)
				declare <8 x i32> @llvm.experimental.vector.insert.v8i32.v3i32(<8 x i32> %vec, <3 x i32> %subvec, i64 %idx)
				declare <8 x i32> @llvm.experimental.vector.insert.v8i32.v4i32(<8 x i32> %vec, <4 x i32> %subvec, i64 %idx)
				declare <8 x i32> @llvm.experimental.vector.insert.v8i32.v8i32(<8 x i32> %vec, <8 x i32> %subvec, i64 %idx)
				declare <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> %vec, <4 x i32> %subvec, i64 %idx)

				; ============================================================================ ;
				; Trivial cases
				; ============================================================================ ;

				; An insert that entirely overwrites an <n x ty> with another <n x ty> is a
				; nop.
				define <8 x i32> @trivial_nop(<8 x i32> %vec, <8 x i32> %subvec) {
				; CHECK-LABEL: @trivial_nop(
				; CHECK-NEXT: ret <8 x i32> [[SUBVEC:%.*]]
				;
				%1 = call <8 x i32> @llvm.experimental.vector.insert.v8i32.v8i32(<8 x i32> %vec, <8 x i32> %subvec, i64 0)
				ret <8 x i32> %1
				}

				; ============================================================================ ;
				; Valid canonicalizations
				; ============================================================================ ;

				define <8 x i32> @valid_insertion_a(<8 x i32> %vec, <2 x i32> %subvec) {
				; CHECK-LABEL: @valid_insertion_a(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[SUBVEC:%.]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[VEC:%.]], <8 x i32> <i32 0, i32 1, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				; CHECK-NEXT: ret <8 x i32> [[TMP2]]
				;
				%1 = call <8 x i32> @llvm.experimental.vector.insert.v8i32.v2i32(<8 x i32> %vec, <2 x i32> %subvec, i64 0)
				ret <8 x i32> %1
				}

				define <8 x i32> @valid_insertion_b(<8 x i32> %vec, <2 x i32> %subvec) {
				; CHECK-LABEL: @valid_insertion_b(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[SUBVEC:%.]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
				; CHECK-NEXT: ret <8 x i32> [[TMP2]]
				;
				%1 = call <8 x i32> @llvm.experimental.vector.insert.v8i32.v2i32(<8 x i32> %vec, <2 x i32> %subvec, i64 2)
				ret <8 x i32> %1
				}

				define <8 x i32> @valid_insertion_c(<8 x i32> %vec, <2 x i32> %subvec) {
				; CHECK-LABEL: @valid_insertion_c(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[SUBVEC:%.]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
				; CHECK-NEXT: ret <8 x i32> [[TMP2]]
				;
				%1 = call <8 x i32> @llvm.experimental.vector.insert.v8i32.v2i32(<8 x i32> %vec, <2 x i32> %subvec, i64 4)
				ret <8 x i32> %1
				}

				define <8 x i32> @valid_insertion_d(<8 x i32> %vec, <2 x i32> %subvec) {
				; CHECK-LABEL: @valid_insertion_d(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[SUBVEC:%.]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
				; CHECK-NEXT: ret <8 x i32> [[TMP2]]
				;
				%1 = call <8 x i32> @llvm.experimental.vector.insert.v8i32.v2i32(<8 x i32> %vec, <2 x i32> %subvec, i64 6)
				ret <8 x i32> %1
				}

				define <8 x i32> @valid_insertion_e(<8 x i32> %vec, <4 x i32> %subvec) {
				; CHECK-LABEL: @valid_insertion_e(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[SUBVEC:%.]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[VEC:%.]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
				; CHECK-NEXT: ret <8 x i32> [[TMP2]]
				;
				%1 = call <8 x i32> @llvm.experimental.vector.insert.v8i32.v4i32(<8 x i32> %vec, <4 x i32> %subvec, i64 0)
				ret <8 x i32> %1
				}

				define <8 x i32> @valid_insertion_f(<8 x i32> %vec, <4 x i32> %subvec) {
				; CHECK-LABEL: @valid_insertion_f(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[SUBVEC:%.]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
				; CHECK-NEXT: ret <8 x i32> [[TMP2]]
				;
				%1 = call <8 x i32> @llvm.experimental.vector.insert.v8i32.v4i32(<8 x i32> %vec, <4 x i32> %subvec, i64 4)
				ret <8 x i32> %1
				}

				define <8 x i32> @valid_insertion_g(<8 x i32> %vec, <3 x i32> %subvec) {
				; CHECK-LABEL: @valid_insertion_g(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <3 x i32> [[SUBVEC:%.]], <3 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[VEC:%.]], <8 x i32> <i32 0, i32 1, i32 2, i32 11, i32 12, i32 13, i32 14, i32 15>
				; CHECK-NEXT: ret <8 x i32> [[TMP2]]
				;
				%1 = call <8 x i32> @llvm.experimental.vector.insert.v8i32.v3i32(<8 x i32> %vec, <3 x i32> %subvec, i64 0)
				ret <8 x i32> %1
				}

				define <8 x i32> @valid_insertion_h(<8 x i32> %vec, <3 x i32> %subvec) {
				; CHECK-LABEL: @valid_insertion_h(
				; CHECK-NEXT: [[TMP1:%.]] = shufflevector <3 x i32> [[SUBVEC:%.]], <3 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[VEC:%.]], <8 x i32> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 6, i32 7>
				; CHECK-NEXT: ret <8 x i32> [[TMP2]]
				;
				%1 = call <8 x i32> @llvm.experimental.vector.insert.v8i32.v3i32(<8 x i32> %vec, <3 x i32> %subvec, i64 3)
				ret <8 x i32> %1
				}

				; ============================================================================ ;
				; Invalid canonicalizations
				; ============================================================================ ;

				; Idx must be the be a constant multiple of the subvector's minimum vector
				; length, otherwise the result is undefined.
				define <8 x i32> @idx_not_constant_multiple(<8 x i32> %vec, <4 x i32> %subvec) {
				; CHECK-LABEL: @idx_not_constant_multiple(
				; CHECK-NEXT: ret <8 x i32> undef
				;
				%1 = call <8 x i32> @llvm.experimental.vector.insert.v8i32.v4i32(<8 x i32> %vec, <4 x i32> %subvec, i64 2)
				ret <8 x i32> %1
				}

				; If the insertion overruns the vector, the result is undefined.
				define <8 x i32> @insert_overrun(<8 x i32> %vec, <8 x i32> %subvec) {
				; CHECK-LABEL: @insert_overrun(
				; CHECK-NEXT: ret <8 x i32> undef
				;
				%1 = call <8 x i32> @llvm.experimental.vector.insert.v8i32.v8i32(<8 x i32> %vec, <8 x i32> %subvec, i64 4)
				ret <8 x i32> %1
				}

				; ============================================================================ ;
				; Scalable cases
				; ============================================================================ ;

				; Scalable insertions should not be canonicalized. This will be lowered to the
				; INSERT_SUBVECTOR ISD node later.
				define <vscale x 4 x i32> @scalable_insert(<vscale x 4 x i32> %vec, <4 x i32> %subvec) {
				; CHECK-LABEL: @scalable_insert(
				; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> [[VEC:%.]], <4 x i32> [[SUBVEC:%.*]], i64 0)
				; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP1]]
				;
				%1 = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> %vec, <4 x i32> %subvec, i64 0)
				ret <vscale x 4 x i32> %1
				}

llvm/test/Verifier/extract-vector-mismatched-element-types.ll

This file was added.

				; RUN: not opt -verify -S < %s 2>&1 >/dev/null \| FileCheck %s

				; CHECK: experimental_vector_extract result must have the same element type as the input vector.
				define <16 x i16> @invalid_mismatched_element_types(<vscale x 16 x i8> %vec) nounwind {
				%retval = call <16 x i16> @llvm.experimental.vector.extract.v16i16.nxv16i8(<vscale x 16 x i8> %vec, i64 0)
				ret <16 x i16> %retval
				}

				declare <16 x i16> @llvm.experimental.vector.extract.v16i16.nxv16i8(<vscale x 16 x i8>, i64)

llvm/test/Verifier/insert-vector-mismatched-element-types.ll

This file was added.

				; RUN: not opt -verify -S < %s 2>&1 >/dev/null \| FileCheck %s

				; CHECK: experimental_vector_insert parameters must have the same element type.
				define <vscale x 16 x i8> @invalid_mismatched_element_types(<vscale x 16 x i8> %vec, <4 x i16> %subvec) nounwind {
				%retval = call <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv16i8.v4i16(<vscale x 16 x i8> %vec, <4 x i16> %subvec, i64 0)
				ret <vscale x 16 x i8> %retval
				}

				declare <vscale x 16 x i8> @llvm.experimental.vector.insert.nxv16i8.v4i16(<vscale x 16 x i8>, <4 x i16>, i64)

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Add llvm.vector.{extract,insert} intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 310478

llvm/docs/LangRef.rst

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/IR/Verifier.cpp

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/test/CodeGen/AArch64/sve-extract-vector.ll

llvm/test/CodeGen/AArch64/sve-insert-vector.ll

llvm/test/Transforms/InstCombine/canonicalize-vector-extract.ll

llvm/test/Transforms/InstCombine/canonicalize-vector-insert.ll

llvm/test/Verifier/extract-vector-mismatched-element-types.ll

llvm/test/Verifier/insert-vector-mismatched-element-types.ll

[SelectionDAG] Add llvm.vector.{extract,insert} intrinsics
ClosedPublic