This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
CodeGen/
-
BasicTTIImpl.h
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64TargetTransformInfo.h
21/25
AArch64TargetTransformInfo.cpp
-
test/Analysis/CostModel/AArch64/
-
Analysis/
-
CostModel/
-
AArch64/
2/2
splice.ll
-
sve-intrinsics.ll

Differential D104630

[AArch64][CostModel] Add cost model for experimental.vector.splice
ClosedPublic

Authored by CarolineConcatto on Jun 21 2021, 3:37 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
RKSimon
bsmith
ABataev
david-arm

Commits

rGa2c5c5605576: [AArch64][CostModel] Add cost model for experimental.vector.splice

Summary

This patch adds a new ShuffleKind SK_Splice and then handle the cost in
getShuffleCost.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	2,080 ms	x64 debian > libFuzzer.libFuzzer::dataflow.test
	122,920 ms	x64 debian > libFuzzer.libFuzzer::only-some-bytes-fork.test
	8,520 ms	x64 debian > libFuzzer.libFuzzer::only-some-bytes.test

Event Timeline

CarolineConcatto created this revision.Jun 21 2021, 3:37 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald TranscriptJun 21 2021, 3:37 AM

CarolineConcatto requested review of this revision.Jun 21 2021, 3:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2021, 3:37 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

The cost for fixed vector with splice could be improved by changing

InstructionCost getPermuteShuffleOverhead(FixedVectorType *VTy)

InstructionCost getPermuteShuffleOverhead(FixedVectorType *VTy, int Index)

and the loop to be until index instead of all elements.
But depends if it is fine to create a new shuffle SK::Splice for exeperimental.vector.splice.

CarolineConcatto edited the summary of this revision. (Show Details)Jun 21 2021, 3:50 AM

CarolineConcatto added reviewers: sdesmalen, RKSimon, bsmith, ABataev, david-arm.

Harbormaster completed remote builds in B110163: Diff 353323.Jun 21 2021, 6:19 AM

Matt added a subscriber: Matt.Jun 21 2021, 8:51 AM

sdesmalen added inline comments.Jun 22 2021, 1:43 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1967	Separate from the type, I think we'll need to distinguish the costs based on the value of the index as well. Given two scalable vectors <x0, x1, x2, x3> and <y0, y1, y2, y3>. For a positive offsets we can use SVE's EXT instruction. E.g. to splice at offset #1, the result of the splice will be <x1, x2, x3, y0>. For a negative offset, we can't use EXT but we can instead use SPLICE which requires (generating) a predicate. For a negative offset of 1 we need a predicate of: <0, 0, 0, 1>. This means the operation can be done using whilelt+not+splice, so for negative offsets it would be more expensive.
1968–1980	At the moment, the costs for these is actually quite high because they're expanded to two stores and one reload. That said, I'd prefer not to reflect that in the cost-model because this is not the desired code-gen and we should favour getting more scalable vectorization to get more testing coverage.
1981–1984	The predicates require two stores, a reload and an additional compare operation. Since predicates don't have a dedicated instruction, it should be fair to model the cost as that of two stores, a reload and a compare.
llvm/test/Analysis/CostModel/AArch64/splice.ll
35	nv?
60	odd spaces.

Any reason you didn't use the update_analyze_test_checks.py script?

Improve cost for scalable vector

@RKSimon I have changed the RUN line to be accepted by update_analyze_test_checks.py
I did not run the script in sve-intrinsics.ll file because. But the CHECK's for splice is generated by update_analyze_test_checks.py

@sdesmalen The cost now takes into account the index and it is different when the scalar type is i1.
For negative index there is predicate mask and a compare and select instruction to choose the correct elements.
That is the reason it uses getCmpSelInstrCost.
For predicated there is a table that has the cost for promoting and truncating together.

sdesmalen added inline comments.Jun 25 2021, 6:32 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1815	is it even needed to pass a Kind or Mask in the first place, they seem unused.
1816	This can be a switch statement instead? Also, how about giving the a name like `getPromotedTypeForPredicate` ?
1843	This could just use `getCastInstrCost` instead of the custom table?
1853	The compare is always an integer compare, i,e. `cmp ge <0, 1, 2, 3, ... N-1>, <idx, idx, idx, ... idx>`
1863	I think the cost has to find one, otherwise we have an unhandled/illegal type. So instead of `if`, this should have an `assert` that Entry != nullptr.

Harbormaster completed remote builds in B110974: Diff 354469.Jun 25 2021, 7:34 AM

Address Sander's comment

CarolineConcatto marked 2 inline comments as done.Jun 28 2021, 2:24 AM

CarolineConcatto added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1815	Hey Sander, You are correct about Mask, but Kind is needed.
1816	Compiler complains that MVN is not an integer for the switch.

Harbormaster completed remote builds in B111229: Diff 354811.Jun 28 2021, 4:32 AM

-Use switch to implement promote type
-Remove parameter Kind from getSpliceCost

Harbormaster completed remote builds in B111333: Diff 354966.Jun 28 2021, 2:16 PM

Thanks for the changes, this is looking better! Just left a few more nits.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1811	Can you move this out of the function into a separate `static MVT getPromotedTypeForPredicate` function? Perhaps we'll want to reuse this at a later point.
1847	If above you write: std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp); MVT PromotedVT = LT.second.getScalarType() == MVT::i1 ? getPromotedTypeForPredicate(LT.second) : LT.second; Then you can drop the IsPredicated and instead inline `PromotedVT.getScalarType() == MVT::i1` in the condition below.
1869	s/Ilegal/Illegal/
1871–1873	If LT.first is `unsigned`, the if-condition is redundant, you can write return LegalizationCost * LT.first; directly.

Create static MVT getPromotedTypeForPredicate function
Create a MVT PromotedVT

CarolineConcatto marked 2 inline comments as done.Jun 29 2021, 7:53 AM

CarolineConcatto added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1811	Is this what you were suggesting?

I spotted a few more things I missed in the last review, but I'm nearly happy.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1811	it was, thanks!
1827	nit: redundant whitespace.
1849	nit: `PromotedTy`
1853	I'm not sure if this matters, but for LegalVTy that's e.g. nxv16i8, the CondTy is nxv16i1, not LegalVTy.
1855	Should these two selects also be performed on `Promoted`?
1857	nit: add newline after this, and maybe add a one line comment saying that this implements the cost of the operation being performed on a promoted type.

Harbormaster completed remote builds in B111533: Diff 355241.Jun 29 2021, 10:39 AM

Replace the use of LegalVTy by PromotedVTy when computing the cost when Index<0
Add comments about promoted cost

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1855	I was considering use Promoted, but was not sure if it was correct. I've changed now to use the promoted type.

Harbormaster completed remote builds in B111755: Diff 355551.Jun 30 2021, 8:52 AM

david-arm added inline comments.Jul 1 2021, 1:17 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
859	nit: Whitespace
1811	Just FYI there is actually already a function in AArch64ISelLowering.cpp that does something very similar: static inline EVT getPromotedVTForPredicate(EVT VT) { assert(VT.isScalableVector() && (VT.getVectorElementType() == MVT::i1) && "Expected scalable predicate vector type!"); switch (VT.getVectorMinNumElements()) { default: llvm_unreachable("unexpected element count for vector"); case 2: return MVT::nxv2i64; case 4: return MVT::nxv4i32; case 8: return MVT::nxv8i16; case 16: return MVT::nxv16i8; } } I wonder if it's worth having a common routine in a header file?
1821	I think at the point we call this function the type has been legalised and split into LT.first (a multiple of a legal type) and LT.second (a legal type). So I think I'd expect the default case to be unreachable here perhaps?
1862	nit: Can you fix the formatting here please? Thanks!

Use getPromotedVTForPredicate from AArch64ISelLowering to compute the promoted type

Hey @david-arm
So I knew about the getPromotedVTForPredicate, but was not sure it was a good idea to use outside the class.
But as you suggested in the review, then I believe there is no problem in making it public.
I think it is best to have only one place to check the promoted type too.
But if not let me know and I can revert the change and apply your suggestion in the previous function.

Harbormaster completed remote builds in B112209: Diff 356188.Jul 2 2021, 11:50 AM

LGTM

This revision is now accepted and ready to land.Jul 5 2021, 4:00 AM

Fix format in line 1852: const auto *Entry = CostTableLookup

This revision was landed with ongoing or failed builds.Jul 5 2021, 6:30 AM

Closed by commit rGa2c5c5605576: [AArch64][CostModel] Add cost model for experimental.vector.splice (authored by CarolineConcatto). · Explain Why

This revision was automatically updated to reflect the committed changes.

CarolineConcatto added a commit: rGa2c5c5605576: [AArch64][CostModel] Add cost model for experimental.vector.splice.

Harbormaster completed remote builds in B112441: Diff 356494.Jul 5 2021, 6:57 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

5 lines

CodeGen/

BasicTTIImpl.h

8 lines

lib/

Target/

AArch64/

AArch64TargetTransformInfo.h

3 lines

AArch64TargetTransformInfo.cpp

70 lines

test/

Analysis/

CostModel/

AArch64/

splice.ll

94 lines

sve-intrinsics.ll

148 lines

Diff 354811

llvm/include/llvm/Analysis/TargetTransformInfo.h

//===- TargetTransformInfo.h ------------------------------------- C++ --===//		//===- TargetTransformInfo.h ------------------------------------- C++ --===//
		Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
/// \file		/// \file
/// This pass exposes codegen information to IR-level passes. Every		/// This pass exposes codegen information to IR-level passes. Every
▲ Show 20 Lines • Show All 843 Lines • ▼ Show 20 Lines	enum ShuffleKind {
SK_Select, ///< Selects elements from the corresponding lane of		SK_Select, ///< Selects elements from the corresponding lane of
///< either source operand. This is equivalent to a		///< either source operand. This is equivalent to a
///< vector select with a constant condition operand.		///< vector select with a constant condition operand.
SK_Transpose, ///< Transpose two vectors.		SK_Transpose, ///< Transpose two vectors.
SK_InsertSubvector, ///< InsertSubvector. Index indicates start offset.		SK_InsertSubvector, ///< InsertSubvector. Index indicates start offset.
SK_ExtractSubvector, ///< ExtractSubvector Index indicates start offset.		SK_ExtractSubvector, ///< ExtractSubvector Index indicates start offset.
SK_PermuteTwoSrc, ///< Merge elements from two source vectors into one		SK_PermuteTwoSrc, ///< Merge elements from two source vectors into one
///< with any shuffle mask.		///< with any shuffle mask.
SK_PermuteSingleSrc ///< Shuffle elements of single source vector with any		SK_PermuteSingleSrc, ///< Shuffle elements of single source vector with any
///< shuffle mask.		///< shuffle mask.
		SK_Splice ///< Concatenates elements from the first input vector
		///< with elements of the second input vector. Returning
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - ///< with elements of the second input vector. Returning - ///< a vector of the same type as the input vectors. + ///< with elements of the second input vector. Returning + ///< a vector of the same type as the input vectors. Lint: Pre-merge checks: clang-format: please reformat the code ``` - ///< with elements of the…
		///< a vector of the same type as the input vectors.
};		};

/// Kind of the reduction data.		/// Kind of the reduction data.
enum ReductionKind {		enum ReductionKind {
RK_None, /// Not a reduction.		RK_None, /// Not a reduction.
RK_Arithmetic, /// Binary reduction data.		RK_Arithmetic, /// Binary reduction data.
RK_MinMax, /// Min/max reduction data.		RK_MinMax, /// Min/max reduction data.
RK_UnsignedMinMax, /// Unsigned min/max reduction data.		RK_UnsignedMinMax, /// Unsigned min/max reduction data.
▲ Show 20 Lines • Show All 1,534 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 819 Lines • ▼ Show 20 Lines	case TTI::SK_PermuteTwoSrc:
return TTI::SK_Transpose;		return TTI::SK_Transpose;
break;		break;
case TTI::SK_Select:		case TTI::SK_Select:
case TTI::SK_Reverse:		case TTI::SK_Reverse:
case TTI::SK_Broadcast:		case TTI::SK_Broadcast:
case TTI::SK_Transpose:		case TTI::SK_Transpose:
case TTI::SK_InsertSubvector:		case TTI::SK_InsertSubvector:
case TTI::SK_ExtractSubvector:		case TTI::SK_ExtractSubvector:
		case TTI::SK_Splice:
break;		break;
}		}
return Kind;		return Kind;
}		}

InstructionCost getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp,		InstructionCost getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp,
ArrayRef<int> Mask, int Index,		ArrayRef<int> Mask, int Index,
VectorType *SubTp) {		VectorType *SubTp) {

switch (improveShuffleKindFromMask(Kind, Mask)) {		switch (improveShuffleKindFromMask(Kind, Mask)) {
case TTI::SK_Broadcast:		case TTI::SK_Broadcast:
return getBroadcastShuffleOverhead(cast<FixedVectorType>(Tp));		return getBroadcastShuffleOverhead(cast<FixedVectorType>(Tp));
case TTI::SK_Select:		case TTI::SK_Select:
		case TTI::SK_Splice:
case TTI::SK_Reverse:		case TTI::SK_Reverse:
case TTI::SK_Transpose:		case TTI::SK_Transpose:
case TTI::SK_PermuteSingleSrc:		case TTI::SK_PermuteSingleSrc:
case TTI::SK_PermuteTwoSrc:		case TTI::SK_PermuteTwoSrc:
return getPermuteShuffleOverhead(cast<FixedVectorType>(Tp));		return getPermuteShuffleOverhead(cast<FixedVectorType>(Tp));
case TTI::SK_ExtractSubvector:		case TTI::SK_ExtractSubvector:
return getExtractSubvectorOverhead(Tp, Index,		return getExtractSubvectorOverhead(Tp, Index,
cast<FixedVectorType>(SubTp));		cast<FixedVectorType>(SubTp));
▲ Show 20 Lines • Show All 517 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_vector_insert: {
TTI::SK_InsertSubvector, cast<VectorType>(Args[0]->getType()), None,		TTI::SK_InsertSubvector, cast<VectorType>(Args[0]->getType()), None,
Index, cast<VectorType>(Args[1]->getType()));		Index, cast<VectorType>(Args[1]->getType()));
}		}
case Intrinsic::experimental_vector_reverse: {		case Intrinsic::experimental_vector_reverse: {
return thisT()->getShuffleCost(TTI::SK_Reverse,		return thisT()->getShuffleCost(TTI::SK_Reverse,
cast<VectorType>(Args[0]->getType()), None,		cast<VectorType>(Args[0]->getType()), None,
0, cast<VectorType>(RetTy));		0, cast<VectorType>(RetTy));
}		}
		case Intrinsic::experimental_vector_splice: {
		unsigned Index = cast<ConstantInt>(Args[2])->getZExtValue();
		return thisT()->getShuffleCost(TTI::SK_Splice,
		cast<VectorType>(Args[0]->getType()), None,
		Index, cast<VectorType>(RetTy));
		}
case Intrinsic::vector_reduce_add:		case Intrinsic::vector_reduce_add:
case Intrinsic::vector_reduce_mul:		case Intrinsic::vector_reduce_mul:
case Intrinsic::vector_reduce_and:		case Intrinsic::vector_reduce_and:
case Intrinsic::vector_reduce_or:		case Intrinsic::vector_reduce_or:
case Intrinsic::vector_reduce_xor:		case Intrinsic::vector_reduce_xor:
case Intrinsic::vector_reduce_smax:		case Intrinsic::vector_reduce_smax:
case Intrinsic::vector_reduce_smin:		case Intrinsic::vector_reduce_smin:
case Intrinsic::vector_reduce_fmax:		case Intrinsic::vector_reduce_fmax:
▲ Show 20 Lines • Show All 800 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	InstructionCost getMinMaxReductionCost(VectorType Ty, VectorType CondTy,
bool IsPairwise, bool IsUnsigned,		bool IsPairwise, bool IsUnsigned,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getArithmeticReductionCostSVE(unsigned Opcode,		InstructionCost getArithmeticReductionCostSVE(unsigned Opcode,
VectorType *ValTy,		VectorType *ValTy,
bool IsPairwiseForm,		bool IsPairwiseForm,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

		InstructionCost getSpliceCost(TTI::ShuffleKind Kind, VectorType *Tp,
		int Index);

InstructionCost getArithmeticInstrCost(		InstructionCost getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >(),		ArrayRef<const Value > Args = ArrayRef<const Value >(),
▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

//===-- AArch64TargetTransformInfo.cpp - AArch64 specific TTI -------------===//		//===-- AArch64TargetTransformInfo.cpp - AArch64 specific TTI -------------===//
		Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64TargetTransformInfo.h"		#include "AArch64TargetTransformInfo.h"
▲ Show 20 Lines • Show All 759 Lines • ▼ Show 20 Lines	InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,

EVT SrcTy = TLI->getValueType(DL, Src);		EVT SrcTy = TLI->getValueType(DL, Src);
EVT DstTy = TLI->getValueType(DL, Dst);		EVT DstTy = TLI->getValueType(DL, Dst);

if (!SrcTy.isSimple() \|\| !DstTy.isSimple())		if (!SrcTy.isSimple() \|\| !DstTy.isSimple())
return AdjustCost(		return AdjustCost(
BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));		BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));

static const TypeConversionCostTblEntry		static const TypeConversionCostTblEntry
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - static const TypeConversionCostTblEntry - ConversionTbl[] = { - { ISD::TRUNCATE, MVT::v4i16, MVT::v4i32, 1 }, - { ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, 0 }, - { ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 3 }, - { ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 6 }, - - // Truncations on nxvmiN - { ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i16, 1 }, - { ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i32, 1 }, 434 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - static const TypeConversionCostTblEntry…
ConversionTbl[] = {		ConversionTbl[] = {
{ ISD::TRUNCATE, MVT::v4i16, MVT::v4i32, 1 },		{ ISD::TRUNCATE, MVT::v4i16, MVT::v4i32, 1 },
{ ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, 0 },		{ ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, 0 },
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 3 },		{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 3 },
{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 6 },		{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 6 },

// Truncations on nxvmiN		// Truncations on nxvmiN
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i16, 1 },		{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i16, 1 },
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i32, 1 },		{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i32, 1 },
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i64, 1 },		{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i64, 1 },
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i16, 1 },		{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i16, 1 },
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i32, 1 },		{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i32, 1 },
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i64, 2 },		{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i64, 2 },
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i16, 1 },		{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i16, 1 },
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i32, 3 },		{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i32, 3 },
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i64, 5 },		{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i64, 5 },
		{ ISD::TRUNCATE, MVT::nxv16i1, MVT::nxv16i8, 1 },
{ ISD::TRUNCATE, MVT::nxv2i16, MVT::nxv2i32, 1 },		{ ISD::TRUNCATE, MVT::nxv2i16, MVT::nxv2i32, 1 },
{ ISD::TRUNCATE, MVT::nxv2i32, MVT::nxv2i64, 1 },		{ ISD::TRUNCATE, MVT::nxv2i32, MVT::nxv2i64, 1 },
{ ISD::TRUNCATE, MVT::nxv4i16, MVT::nxv4i32, 1 },		{ ISD::TRUNCATE, MVT::nxv4i16, MVT::nxv4i32, 1 },
{ ISD::TRUNCATE, MVT::nxv4i32, MVT::nxv4i64, 2 },		{ ISD::TRUNCATE, MVT::nxv4i32, MVT::nxv4i64, 2 },
{ ISD::TRUNCATE, MVT::nxv8i16, MVT::nxv8i32, 3 },		{ ISD::TRUNCATE, MVT::nxv8i16, MVT::nxv8i32, 3 },
{ ISD::TRUNCATE, MVT::nxv8i32, MVT::nxv8i64, 6 },		{ ISD::TRUNCATE, MVT::nxv8i32, MVT::nxv8i64, 6 },

// The number of shll instructions for the extension.		// The number of shll instructions for the extension.
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	ConversionTbl[] = {
// Complex: to v2f64		// Complex: to v2f64
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i8, 4 },		{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i8, 4 },
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i16, 4 },		{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i16, 4 },
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i32, 2 },		{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i32, 2 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i8, 4 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i8, 4 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i16, 4 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i16, 4 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 2 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 2 },


david-armUnsubmitted Done Reply Inline Actions nit: Whitespace david-arm: nit: Whitespace
// LowerVectorFP_TO_INT		// LowerVectorFP_TO_INT
{ ISD::FP_TO_SINT, MVT::v2i32, MVT::v2f32, 1 },		{ ISD::FP_TO_SINT, MVT::v2i32, MVT::v2f32, 1 },
{ ISD::FP_TO_SINT, MVT::v4i32, MVT::v4f32, 1 },		{ ISD::FP_TO_SINT, MVT::v4i32, MVT::v4f32, 1 },
{ ISD::FP_TO_SINT, MVT::v2i64, MVT::v2f64, 1 },		{ ISD::FP_TO_SINT, MVT::v2i64, MVT::v2f64, 1 },
{ ISD::FP_TO_UINT, MVT::v2i32, MVT::v2f32, 1 },		{ ISD::FP_TO_UINT, MVT::v2i32, MVT::v2f32, 1 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 1 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 1 },
{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f64, 1 },		{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f64, 1 },

▲ Show 20 Lines • Show All 933 Lines • ▼ Show 20 Lines	AArch64TTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *ValTy,

if (const auto *Entry = CostTableLookup(CostTblNoPairwise, ISD, MTy))		if (const auto *Entry = CostTableLookup(CostTblNoPairwise, ISD, MTy))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwiseForm,		return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwiseForm,
CostKind);		CostKind);
}		}

		InstructionCost AArch64TTIImpl::getSpliceCost(TTI::ShuffleKind Kind,
		VectorType *Tp,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - VectorType Tp, - int Index) { + VectorType Tp, int Index) { Lint: Pre-merge checks: clang-format: please reformat the code ``` - …
		sdesmalenUnsubmitted Done Reply Inline Actions Can you move this out of the function into a separate `static MVT getPromotedTypeForPredicate` function? Perhaps we'll want to reuse this at a later point. sdesmalen: Can you move this out of the function into a separate `static MVT getPromotedTypeForPredicate`…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Is this what you were suggesting? CarolineConcatto: Is this what you were suggesting?
		sdesmalenUnsubmitted Not Done Reply Inline Actions it was, thanks! sdesmalen: it was, thanks!
		david-armUnsubmitted Not Done Reply Inline Actions Just FYI there is actually already a function in AArch64ISelLowering.cpp that does something very similar: static inline EVT getPromotedVTForPredicate(EVT VT) { assert(VT.isScalableVector() && (VT.getVectorElementType() == MVT::i1) && "Expected scalable predicate vector type!"); switch (VT.getVectorMinNumElements()) { default: llvm_unreachable("unexpected element count for vector"); case 2: return MVT::nxv2i64; case 4: return MVT::nxv4i32; case 8: return MVT::nxv8i16; case 16: return MVT::nxv16i8; } } I wonder if it's worth having a common routine in a header file? david-arm: Just FYI there is actually already a function in AArch64ISelLowering.cpp that does something…
		int Index) {

		assert(Kind == TTI::SK_Splice && "Expected Kind Splice");
		auto getPromotedTypeForPredicate = [&](MVT M) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'getPromotedTypeForPredicate' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'getPromotedTypeForPredicate' [readability…
		sdesmalenUnsubmitted Done Reply Inline Actions is it even needed to pass a Kind or Mask in the first place, they seem unused. sdesmalen: is it even needed to pass a Kind or Mask in the first place, they seem unused.
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Hey Sander, You are correct about Mask, but Kind is needed. CarolineConcatto: Hey Sander, You are correct about Mask, but Kind is needed.
		if (M == MVT::nxv16i1)
		sdesmalenUnsubmitted Done Reply Inline Actions This can be a switch statement instead? Also, how about giving the a name like `getPromotedTypeForPredicate` ? sdesmalen: This can be a switch statement instead? Also, how about giving the a name like…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Compiler complains that MVN is not an integer for the switch. CarolineConcatto: Compiler complains that MVN is not an integer for the switch.
		return MVT::nxv16i8;
		if (M == MVT::nxv8i1)
		return MVT::nxv8i16;
		if (M == MVT::nxv4i1)
		return MVT::nxv4i32;
		david-armUnsubmitted Done Reply Inline Actions I think at the point we call this function the type has been legalised and split into LT.first (a multiple of a legal type) and LT.second (a legal type). So I think I'd expect the default case to be unreachable here perhaps? david-arm: I think at the point we call this function the type has been legalised and split into LT.first…
		if (M == MVT::nxv2i1)
		return MVT::nxv2i64;
		return MVT::Untyped;
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - return MVT::Untyped; + return MVT::Untyped; Lint: Pre-merge checks: clang-format: please reformat the code ``` - return MVT::Untyped; + return MVT::Untyped…
		};

		static const CostTblEntry ShuffleTbl[] = {
		sdesmalenUnsubmitted Done Reply Inline Actions nit: redundant whitespace. sdesmalen: nit: redundant whitespace.
		{ TTI::SK_Splice, MVT::nxv16i8, 1 },
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - { TTI::SK_Splice, MVT::nxv16i8, 1 }, - { TTI::SK_Splice, MVT::nxv8i16, 1 }, - { TTI::SK_Splice, MVT::nxv4i32, 1 }, - { TTI::SK_Splice, MVT::nxv2i64, 1 }, - { TTI::SK_Splice, MVT::nxv2f16, 1 }, - { TTI::SK_Splice, MVT::nxv4f16, 1 }, - { TTI::SK_Splice, MVT::nxv8f16, 1 }, - { TTI::SK_Splice, MVT::nxv2bf16, 1 }, - { TTI::SK_Splice, MVT::nxv4bf16, 1 }, - { TTI::SK_Splice, MVT::nxv8bf16, 1 }, 10 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - { TTI::SK_Splice, MVT::nxv16i8, 1 }…
		{ TTI::SK_Splice, MVT::nxv8i16, 1 },
		{ TTI::SK_Splice, MVT::nxv4i32, 1 },
		{ TTI::SK_Splice, MVT::nxv2i64, 1 },
		{ TTI::SK_Splice, MVT::nxv2f16, 1 },
		{ TTI::SK_Splice, MVT::nxv4f16, 1 },
		{ TTI::SK_Splice, MVT::nxv8f16, 1 },
		{ TTI::SK_Splice, MVT::nxv2bf16, 1 },
		{ TTI::SK_Splice, MVT::nxv4bf16, 1 },
		{ TTI::SK_Splice, MVT::nxv8bf16, 1 },
		{ TTI::SK_Splice, MVT::nxv2f32, 1 },
		{ TTI::SK_Splice, MVT::nxv4f32, 1 },
		{ TTI::SK_Splice, MVT::nxv2f64, 1 },
		};

		std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
		sdesmalenUnsubmitted Done Reply Inline Actions This could just use `getCastInstrCost` instead of the custom table? sdesmalen: This could just use `getCastInstrCost` instead of the custom table?
		Type *LegalVTy = EVT(LT.second).getTypeForEVT(Tp->getContext());
		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput; + TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput; Lint: Pre-merge checks: clang-format: please reformat the code ``` - TTI::TargetCostKind CostKind = TTI…
		InstructionCost LegalizationCost = 0;
		bool IsPredicated = (TLI->getValueType(DL, Tp, true).getScalarType() == MVT::i1);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - bool IsPredicated = (TLI->getValueType(DL, Tp, true).getScalarType() == MVT::i1); + bool IsPredicated = + (TLI->getValueType(DL, Tp, true).getScalarType() == MVT::i1); Lint: Pre-merge checks: clang-format: please reformat the code ``` - bool IsPredicated = (TLI->getValueType(DL, Tp…
		sdesmalenUnsubmitted Done Reply Inline Actions If above you write: std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp); MVT PromotedVT = LT.second.getScalarType() == MVT::i1 ? getPromotedTypeForPredicate(LT.second) : LT.second; Then you can drop the IsPredicated and instead inline `PromotedVT.getScalarType() == MVT::i1` in the condition below. sdesmalen: If above you write: std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL…
		if (Index < 0) {
		LegalizationCost = getCmpSelInstrCost(Instruction::ICmp, LegalVTy, LegalVTy,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - LegalizationCost = getCmpSelInstrCost(Instruction::ICmp, LegalVTy, LegalVTy, - CmpInst::BAD_ICMP_PREDICATE, CostKind) + - getCmpSelInstrCost(Instruction::Select, LegalVTy, - LegalVTy, CmpInst::BAD_ICMP_PREDICATE, - CostKind); + LegalizationCost = + getCmpSelInstrCost(Instruction::ICmp, LegalVTy, LegalVTy, + CmpInst::BAD_ICMP_PREDICATE, CostKind) + + getCmpSelInstrCost(Instruction::Select, LegalVTy, LegalVTy, + CmpInst::BAD_ICMP_PREDICATE, CostKind); Lint: Pre-merge checks: clang-format: please reformat the code ``` - LegalizationCost = getCmpSelInstrCost…
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `PromotedTy` sdesmalen: nit: `PromotedTy`
		CmpInst::BAD_ICMP_PREDICATE, CostKind) +
		getCmpSelInstrCost(Instruction::Select, LegalVTy,
		LegalVTy, CmpInst::BAD_ICMP_PREDICATE,
		CostKind);
		sdesmalenUnsubmitted Done Reply Inline Actions The compare is always an integer compare, i,e. `cmp ge <0, 1, 2, 3, ... N-1>, <idx, idx, idx, ... idx>` sdesmalen: The compare is always an integer compare, i,e. `cmp ge <0, 1, 2, 3, ... N-1>, <idx, idx, idx, .
		sdesmalenUnsubmitted Done Reply Inline Actions I'm not sure if this matters, but for LegalVTy that's e.g. nxv16i8, the CondTy is nxv16i1, not LegalVTy. sdesmalen: I'm not sure if this matters, but for LegalVTy that's e.g. nxv16i8, the CondTy is nxv16i1, not…
		}

		sdesmalenUnsubmitted Done Reply Inline Actions Should these two selects also be performed on `Promoted`? sdesmalen: Should these two selects also be performed on `Promoted`?
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions I was considering use Promoted, but was not sure if it was correct. I've changed now to use the promoted type. CarolineConcatto: I was considering use Promoted, but was not sure if it was correct. I've changed now to use the…
		if (IsPredicated) {
		Type *Promote = EVT(getPromotedTypeForPredicate(LT.second)).
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Type Promote = EVT(getPromotedTypeForPredicate(LT.second)). - getTypeForEVT(Tp->getContext()); - LegalizationCost += getCastInstrCost(Instruction::ZExt, Promote, - LegalVTy, TTI::CastContextHint::None, - CostKind) + - getCastInstrCost(Instruction::Trunc, LegalVTy, - Promote, TTI::CastContextHint::None, - CostKind); + Type Promote = EVT(getPromotedTypeForPredicate(LT.second)) + .getTypeForEVT(Tp->getContext()); 4 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - Type *Promote = EVT…
		sdesmalenUnsubmitted Done Reply Inline Actions nit: add newline after this, and maybe add a one line comment saying that this implements the cost of the operation being performed on a promoted type. sdesmalen: nit: add newline after this, and maybe add a one line comment saying that this implements the…
		getTypeForEVT(Tp->getContext());
		LegalizationCost += getCastInstrCost(Instruction::ZExt, Promote,
		LegalVTy, TTI::CastContextHint::None,
		CostKind) +
		getCastInstrCost(Instruction::Trunc, LegalVTy,
		david-armUnsubmitted Done Reply Inline Actions nit: Can you fix the formatting here please? Thanks! david-arm: nit: Can you fix the formatting here please? Thanks!
		Promote, TTI::CastContextHint::None,
		sdesmalenUnsubmitted Done Reply Inline Actions I think the cost has to find one, otherwise we have an unhandled/illegal type. So instead of `if`, this should have an `assert` that Entry != nullptr. sdesmalen: I think the cost has to find one, otherwise we have an unhandled/illegal type. So instead of…
		CostKind);
		}
		const auto *Entry = CostTableLookup(ShuffleTbl, Kind,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - const auto Entry = CostTableLookup(ShuffleTbl, Kind, - (IsPredicated) ? getPromotedTypeForPredicate(LT.second) - : LT.second); - assert (Entry && "Ilegal Type for Splice"); + const auto Entry = CostTableLookup( + ShuffleTbl, Kind, + (IsPredicated) ? getPromotedTypeForPredicate(LT.second) : LT.second); + assert(Entry && "Ilegal Type for Splice"); Lint: Pre-merge checks: clang-format: please reformat the code ``` - const auto *Entry = CostTableLookup(ShuffleTbl…
		(IsPredicated) ? getPromotedTypeForPredicate(LT.second)
		: LT.second);
		assert (Entry && "Ilegal Type for Splice");
		sdesmalenUnsubmitted Done Reply Inline Actions s/Ilegal/Illegal/ sdesmalen: s/Ilegal/Illegal/
		LegalizationCost += Entry->Cost;
		if (LT.first > 1)
		return LegalizationCost * LT.first;
		return LegalizationCost;
		sdesmalenUnsubmitted Done Reply Inline Actions If LT.first is `unsigned`, the if-condition is redundant, you can write return LegalizationCost * LT.first; directly. sdesmalen: If LT.first is `unsigned`, the if-condition is redundant, you can write return…
		}

InstructionCost AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind Kind,		InstructionCost AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
VectorType *Tp,		VectorType *Tp,
ArrayRef<int> Mask, int Index,		ArrayRef<int> Mask, int Index,
VectorType *SubTp) {		VectorType *SubTp) {
Kind = improveShuffleKindFromMask(Kind, Mask);		Kind = improveShuffleKindFromMask(Kind, Mask);
if (Kind == TTI::SK_Broadcast \|\| Kind == TTI::SK_Transpose \|\|		if (Kind == TTI::SK_Broadcast \|\| Kind == TTI::SK_Transpose \|\|
Kind == TTI::SK_Select \|\| Kind == TTI::SK_PermuteSingleSrc \|\|		Kind == TTI::SK_Select \|\| Kind == TTI::SK_PermuteSingleSrc \|\|
Kind == TTI::SK_Reverse) {		Kind == TTI::SK_Reverse) {
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	static const CostTblEntry ShuffleTbl[] = {
{ TTI::SK_Reverse, MVT::nxv8bf16, 1 },		{ TTI::SK_Reverse, MVT::nxv8bf16, 1 },
{ TTI::SK_Reverse, MVT::nxv2f32, 1 },		{ TTI::SK_Reverse, MVT::nxv2f32, 1 },
{ TTI::SK_Reverse, MVT::nxv4f32, 1 },		{ TTI::SK_Reverse, MVT::nxv4f32, 1 },
{ TTI::SK_Reverse, MVT::nxv2f64, 1 },		{ TTI::SK_Reverse, MVT::nxv2f64, 1 },
{ TTI::SK_Reverse, MVT::nxv16i1, 1 },		{ TTI::SK_Reverse, MVT::nxv16i1, 1 },
{ TTI::SK_Reverse, MVT::nxv8i1, 1 },		{ TTI::SK_Reverse, MVT::nxv8i1, 1 },
{ TTI::SK_Reverse, MVT::nxv4i1, 1 },		{ TTI::SK_Reverse, MVT::nxv4i1, 1 },
{ TTI::SK_Reverse, MVT::nxv2i1, 1 },		{ TTI::SK_Reverse, MVT::nxv2i1, 1 },
};		};
		sdesmalenUnsubmitted Not Done Reply Inline Actions Separate from the type, I think we'll need to distinguish the costs based on the value of the index as well. Given two scalable vectors <x0, x1, x2, x3> and <y0, y1, y2, y3>. For a positive offsets we can use SVE's EXT instruction. E.g. to splice at offset #1, the result of the splice will be <x1, x2, x3, y0>. For a negative offset, we can't use EXT but we can instead use SPLICE which requires (generating) a predicate. For a negative offset of 1 we need a predicate of: <0, 0, 0, 1>. This means the operation can be done using whilelt+not+splice, so for negative offsets it would be more expensive. sdesmalen: Separate from the type, I think we'll need to distinguish the costs based on the value of the…
std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);		std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
if (const auto *Entry = CostTableLookup(ShuffleTbl, Kind, LT.second))		if (const auto *Entry = CostTableLookup(ShuffleTbl, Kind, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;
}		}
		if (Kind == TTI::SK_Splice && isa<ScalableVectorType>(Tp))
		return getSpliceCost(Kind, Tp, Index);
return BaseT::getShuffleCost(Kind, Tp, Mask, Index, SubTp);		return BaseT::getShuffleCost(Kind, Tp, Mask, Index, SubTp);
}		}
		sdesmalenUnsubmitted Not Done Reply Inline Actions The predicates require two stores, a reload and an additional compare operation. Since predicates don't have a dedicated instruction, it should be fair to model the cost as that of two stores, a reload and a compare. sdesmalen: The predicates require two stores, a reload and an additional compare operation. Since…

llvm/test/Analysis/CostModel/AArch64/splice.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; RUN: opt < %s -analyze -cost-model -S -mtriple=aarch64--linux-gnu \| FileCheck %s

				define void @vector_splice() #0 {
				; CHECK-LABEL: 'vector_splice'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 90 for instruction: %splice.v16i8 = call <16 x i8> @llvm.experimental.vector.splice.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 180 for instruction: %splice.v32i8 = call <32 x i8> @llvm.experimental.vector.splice.v32i8(<32 x i8> zeroinitializer, <32 x i8> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2i16 = call <2 x i16> @llvm.experimental.vector.splice.v2i16(<2 x i16> zeroinitializer, <2 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %splice.v4i16 = call <4 x i16> @llvm.experimental.vector.splice.v4i16(<4 x i16> zeroinitializer, <4 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %splice.v8i16 = call <8 x i16> @llvm.experimental.vector.splice.v8i16(<8 x i16> zeroinitializer, <8 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 84 for instruction: %splice.v16i16 = call <16 x i16> @llvm.experimental.vector.splice.v16i16(<16 x i16> zeroinitializer, <16 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %splice.v4i32 = call <4 x i32> @llvm.experimental.vector.splice.v4i32(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %splice.v8i32 = call <8 x i32> @llvm.experimental.vector.splice.v8i32(<8 x i32> zeroinitializer, <8 x i32> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2i64 = call <2 x i64> @llvm.experimental.vector.splice.v2i64(<2 x i64> zeroinitializer, <2 x i64> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %splice.v4i64 = call <4 x i64> @llvm.experimental.vector.splice.v4i64(<4 x i64> zeroinitializer, <4 x i64> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2f16 = call <2 x half> @llvm.experimental.vector.splice.v2f16(<2 x half> zeroinitializer, <2 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %splice.v4f16 = call <4 x half> @llvm.experimental.vector.splice.v4f16(<4 x half> zeroinitializer, <4 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %splice.v8f16 = call <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half> zeroinitializer, <8 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 84 for instruction: %splice.v16f16 = call <16 x half> @llvm.experimental.vector.splice.v16f16(<16 x half> zeroinitializer, <16 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2f32 = call <2 x float> @llvm.experimental.vector.splice.v2f32(<2 x float> zeroinitializer, <2 x float> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %splice.v4f32 = call <4 x float> @llvm.experimental.vector.splice.v4f32(<4 x float> zeroinitializer, <4 x float> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %splice.v8f32 = call <8 x float> @llvm.experimental.vector.splice.v8f32(<8 x float> zeroinitializer, <8 x float> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2f64 = call <2 x double> @llvm.experimental.vector.splice.v2f64(<2 x double> zeroinitializer, <2 x double> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %splice.v4f64 = call <4 x double> @llvm.experimental.vector.splice.v4f64(<4 x double> zeroinitializer, <4 x double> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2bf16 = call <2 x bfloat> @llvm.experimental.vector.splice.v2bf16(<2 x bfloat> zeroinitializer, <2 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %splice.v4bf16 = call <4 x bfloat> @llvm.experimental.vector.splice.v4bf16(<4 x bfloat> zeroinitializer, <4 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %splice.v8bf16 = call <8 x bfloat> @llvm.experimental.vector.splice.v8bf16(<8 x bfloat> zeroinitializer, <8 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 84 for instruction: %splice.v16bf16 = call <16 x bfloat> @llvm.experimental.vector.splice.v16bf16(<16 x bfloat> zeroinitializer, <16 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 90 for instruction: %splice.v16i1 = call <16 x i1> @llvm.experimental.vector.splice.v16i1(<16 x i1> zeroinitializer, <16 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %splice.v8i1 = call <8 x i1> @llvm.experimental.vector.splice.v8i1(<8 x i1> zeroinitializer, <8 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %splice.v4i1 = call <4 x i1> @llvm.experimental.vector.splice.v4i1(<4 x i1> zeroinitializer, <4 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2i1 = call <2 x i1> @llvm.experimental.vector.splice.v2i1(<2 x i1> zeroinitializer, <2 x i1> zeroinitializer, i32 1)
				;
				%splice.v16i8 = call <16 x i8> @llvm.experimental.vector.splice.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
				%splice.v32i8 = call <32 x i8> @llvm.experimental.vector.splice.v32i8(<32 x i8> zeroinitializer, <32 x i8> zeroinitializer, i32 1)
				sdesmalenUnsubmitted Done Reply Inline Actions nv? sdesmalen: nv?
				%splice.v2i16 = call <2 x i16> @llvm.experimental.vector.splice.v2i16(<2 x i16> zeroinitializer, <2 x i16> zeroinitializer, i32 1)
				%splice.v4i16 = call <4 x i16> @llvm.experimental.vector.splice.v4i16(<4 x i16> zeroinitializer, <4 x i16> zeroinitializer, i32 1)
				%splice.v8i16 = call <8 x i16> @llvm.experimental.vector.splice.v8i16(<8 x i16> zeroinitializer, <8 x i16> zeroinitializer, i32 1)
				%splice.v16i16 = call <16 x i16> @llvm.experimental.vector.splice.v16i16(<16 x i16> zeroinitializer, <16 x i16> zeroinitializer, i32 1)
				%splice.v4i32 = call <4 x i32> @llvm.experimental.vector.splice.v4i32(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 1)
				%splice.v8i32 = call <8 x i32> @llvm.experimental.vector.splice.v8i32(<8 x i32> zeroinitializer, <8 x i32> zeroinitializer, i32 1)
				%splice.v2i64 = call <2 x i64> @llvm.experimental.vector.splice.v2i64(<2 x i64> zeroinitializer, <2 x i64> zeroinitializer, i32 1)
				%splice.v4i64 = call <4 x i64> @llvm.experimental.vector.splice.v4i64(<4 x i64> zeroinitializer, <4 x i64> zeroinitializer, i32 1)
				%splice.v2f16 = call <2 x half> @llvm.experimental.vector.splice.v2f16(<2 x half> zeroinitializer, <2 x half> zeroinitializer, i32 1)
				%splice.v4f16 = call <4 x half> @llvm.experimental.vector.splice.v4f16(<4 x half> zeroinitializer, <4 x half> zeroinitializer, i32 1)
				%splice.v8f16 = call <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half> zeroinitializer, <8 x half> zeroinitializer, i32 1)
				%splice.v16f16 = call <16 x half> @llvm.experimental.vector.splice.v16f16(<16 x half> zeroinitializer, <16 x half> zeroinitializer, i32 1)
				%splice.v2f32 = call <2 x float> @llvm.experimental.vector.splice.v2f32(<2 x float> zeroinitializer, <2 x float> zeroinitializer, i32 1)
				%splice.v4f32 = call <4 x float> @llvm.experimental.vector.splice.v4f32(<4 x float> zeroinitializer, <4 x float> zeroinitializer, i32 1)
				%splice.v8f32 = call <8 x float> @llvm.experimental.vector.splice.v8f32(<8 x float> zeroinitializer, <8 x float> zeroinitializer, i32 1)
				%splice.v2f64 = call <2 x double> @llvm.experimental.vector.splice.v2f64(<2 x double> zeroinitializer, <2 x double> zeroinitializer, i32 1)
				%splice.v4f64 = call <4 x double> @llvm.experimental.vector.splice.v4f64(<4 x double> zeroinitializer, <4 x double> zeroinitializer, i32 1)
				%splice.v2bf16 = call <2 x bfloat> @llvm.experimental.vector.splice.v2bf16(<2 x bfloat> zeroinitializer, <2 x bfloat> zeroinitializer, i32 1)
				%splice.v4bf16 = call <4 x bfloat> @llvm.experimental.vector.splice.v4bf16(<4 x bfloat> zeroinitializer, <4 x bfloat> zeroinitializer, i32 1)
				%splice.v8bf16 = call <8 x bfloat> @llvm.experimental.vector.splice.v8bf16(<8 x bfloat> zeroinitializer, <8 x bfloat> zeroinitializer, i32 1)
				%splice.v16bf16 = call <16 x bfloat> @llvm.experimental.vector.splice.v16bf16(<16 x bfloat> zeroinitializer, <16 x bfloat> zeroinitializer, i32 1)
				%splice.v16i1 = call <16 x i1> @llvm.experimental.vector.splice.v16i1(<16 x i1> zeroinitializer, <16 x i1> zeroinitializer, i32 1)
				%splice.v8i1 = call <8 x i1> @llvm.experimental.vector.splice.v8i1(<8 x i1> zeroinitializer, <8 x i1> zeroinitializer, i32 1)
				%splice.v4i1 = call <4 x i1> @llvm.experimental.vector.splice.v4i1(<4 x i1> zeroinitializer, <4 x i1> zeroinitializer, i32 1)
				%splice.v2i1 = call <2 x i1> @llvm.experimental.vector.splice.v2i1(<2 x i1> zeroinitializer, <2 x i1> zeroinitializer, i32 1)
				sdesmalenUnsubmitted Done Reply Inline Actions odd spaces. sdesmalen: odd spaces.
				ret void
				}

				declare <2 x i1> @llvm.experimental.vector.splice.v2i1(<2 x i1>, <2 x i1>, i32)
				declare <4 x i1> @llvm.experimental.vector.splice.v4i1(<4 x i1>, <4 x i1>, i32)
				declare <8 x i1> @llvm.experimental.vector.splice.v8i1(<8 x i1>, <8 x i1>, i32)
				declare <16 x i1> @llvm.experimental.vector.splice.v16i1(<16 x i1>, <16 x i1>, i32)
				declare <2 x i8> @llvm.experimental.vector.splice.v2i8(<2 x i8>, <2 x i8>, i32)
				declare <16 x i8> @llvm.experimental.vector.splice.v16i8(<16 x i8>, <16 x i8>, i32)
				declare <32 x i8> @llvm.experimental.vector.splice.v32i8(<32 x i8>, <32 x i8>, i32)
				declare <2 x i16> @llvm.experimental.vector.splice.v2i16(<2 x i16>, <2 x i16>, i32)
				declare <4 x i16> @llvm.experimental.vector.splice.v4i16(<4 x i16>, <4 x i16>, i32)
				declare <8 x i16> @llvm.experimental.vector.splice.v8i16(<8 x i16>, <8 x i16>, i32)
				declare <16 x i16> @llvm.experimental.vector.splice.v16i16(<16 x i16>, <16 x i16>, i32)
				declare <4 x i32> @llvm.experimental.vector.splice.v4i32(<4 x i32>, <4 x i32>, i32)
				declare <8 x i32> @llvm.experimental.vector.splice.v8i32(<8 x i32>, <8 x i32>, i32)
				declare <2 x i64> @llvm.experimental.vector.splice.v2i64(<2 x i64>, <2 x i64>, i32)
				declare <4 x i64> @llvm.experimental.vector.splice.v4i64(<4 x i64>, <4 x i64>, i32)
				declare <2 x half> @llvm.experimental.vector.splice.v2f16(<2 x half>, <2 x half>, i32)
				declare <4 x half> @llvm.experimental.vector.splice.v4f16(<4 x half>, <4 x half>, i32)
				declare <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half>, <8 x half>, i32)
				declare <16 x half> @llvm.experimental.vector.splice.v16f16(<16 x half>, <16 x half>, i32)
				declare <2 x bfloat> @llvm.experimental.vector.splice.v2bf16(<2 x bfloat>, <2 x bfloat>, i32)
				declare <4 x bfloat> @llvm.experimental.vector.splice.v4bf16(<4 x bfloat>, <4 x bfloat>, i32)
				declare <8 x bfloat> @llvm.experimental.vector.splice.v8bf16(<8 x bfloat>, <8 x bfloat>, i32)
				declare <16 x bfloat> @llvm.experimental.vector.splice.v16bf16(<16 x bfloat>, <16 x bfloat>, i32)
				declare <2 x float> @llvm.experimental.vector.splice.v2f32(<2 x float>, <2 x float>, i32)
				declare <4 x float> @llvm.experimental.vector.splice.v4f32(<4 x float>, <4 x float>, i32)
				declare <8 x float> @llvm.experimental.vector.splice.v8f32(<8 x float>, <8 x float>, i32)
				declare <16 x float> @llvm.experimental.vector.splice.v16f32(<16 x float>, <16 x float>, i32)
				declare <2 x double> @llvm.experimental.vector.splice.v2f64(<2 x double>, <2 x double>, i32)
				declare <4 x double> @llvm.experimental.vector.splice.v4f64(<4 x double>, <4 x double>, i32)

				attributes #0 = { "target-features"="+bf16" }

llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll

	; RUN: opt -cost-model -analyze -mtriple=aarch64--linux-gnu -mattr=+sve < %s \| FileCheck %s			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; RUN: opt < %s -analyze -cost-model -S -mtriple=aarch64--linux-gnu -mattr=+sve \| FileCheck %s

	define void @vector_insert_extract(<vscale x 4 x i32> %v0, <vscale x 16 x i32> %v1, <16 x i32> %v2) {			define void @vector_insert_extract(<vscale x 4 x i32> %v0, <vscale x 16 x i32> %v1, <16 x i32> %v2) {
	; CHECK-LABEL: 'vector_insert_extract'			; CHECK-LABEL: 'vector_insert_extract'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.experimental.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)			; CHECK-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.experimental.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %insert_fixed_into_scalable = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32> %v0, <16 x i32> %v2, i64 0)			; CHECK-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %insert_fixed_into_scalable = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32> %v0, <16 x i32> %v2, i64 0)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.experimental.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.experimental.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.experimental.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.experimental.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)
	%extract_fixed_from_scalable = call <16 x i32> @llvm.experimental.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)			%extract_fixed_from_scalable = call <16 x i32> @llvm.experimental.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
	▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines
	declare <vscale x 4 x float> @llvm.pow.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.pow.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>)
	declare <vscale x 4 x float> @llvm.powi.nxv4f32.i32(<vscale x 4 x float>, i32)			declare <vscale x 4 x float> @llvm.powi.nxv4f32.i32(<vscale x 4 x float>, i32)
	declare <vscale x 4 x float> @llvm.exp.nxv4f32(<vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.exp.nxv4f32(<vscale x 4 x float>)
	declare <vscale x 4 x float> @llvm.exp2.nxv4f32(<vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.exp2.nxv4f32(<vscale x 4 x float>)
	declare <vscale x 4 x float> @llvm.log.nxv4f32(<vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.log.nxv4f32(<vscale x 4 x float>)
	declare <vscale x 4 x float> @llvm.log2.nxv4f32(<vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.log2.nxv4f32(<vscale x 4 x float>)
	declare <vscale x 4 x float> @llvm.log10.nxv4f32(<vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.log10.nxv4f32(<vscale x 4 x float>)

				define void @vector_splice() #0 {
				; CHECK-LABEL: 'vector_splice'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv16i8 = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv32i8 = call <vscale x 32 x i8> @llvm.experimental.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv2i16 = call <vscale x 2 x i16> @llvm.experimental.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv4i16 = call <vscale x 4 x i16> @llvm.experimental.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv8i16 = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv16i16 = call <vscale x 16 x i16> @llvm.experimental.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv4i32 = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv8i32 = call <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv2i64 = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv4i64 = call <vscale x 4 x i64> @llvm.experimental.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv2f16 = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv4f16 = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv8f16 = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv16f16 = call <vscale x 16 x half> @llvm.experimental.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv2f32 = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv4f32 = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv8f32 = call <vscale x 8 x float> @llvm.experimental.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv2f64 = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv4f64 = call <vscale x 4 x double> @llvm.experimental.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv2bf16 = call <vscale x 2 x bfloat> @llvm.experimental.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv4bf16 = call <vscale x 4 x bfloat> @llvm.experimental.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv8bf16 = call <vscale x 8 x bfloat> @llvm.experimental.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv16bf16 = call <vscale x 16 x bfloat> @llvm.experimental.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv16i1 = call <vscale x 16 x i1> @llvm.experimental.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv8i1 = call <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv4i1 = call <vscale x 4 x i1> @llvm.experimental.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2i1 = call <vscale x 2 x i1> @llvm.experimental.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv16i8_neg = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv32i8_neg = call <vscale x 32 x i8> @llvm.experimental.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2i16_neg = call <vscale x 2 x i16> @llvm.experimental.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv4i16_neg = call <vscale x 4 x i16> @llvm.experimental.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv8i16_neg = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv16i16_neg = call <vscale x 16 x i16> @llvm.experimental.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv4i32_neg = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv8i32_neg = call <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2i64_neg = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv4i64_neg = call <vscale x 4 x i64> @llvm.experimental.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2f16_neg = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv4f16_neg = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv8f16_neg = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv16f16_neg = call <vscale x 16 x half> @llvm.experimental.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2f32_neg = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv4f32_neg = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv8f32_neg = call <vscale x 8 x float> @llvm.experimental.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2f64_neg = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv4f64_neg = call <vscale x 4 x double> @llvm.experimental.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2bf16_neg = call <vscale x 2 x bfloat> @llvm.experimental.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv4bf16_neg = call <vscale x 4 x bfloat> @llvm.experimental.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv8bf16_neg = call <vscale x 8 x bfloat> @llvm.experimental.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv16bf16_neg = call <vscale x 16 x bfloat> @llvm.experimental.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %splice_nxv16i1_neg = call <vscale x 16 x i1> @llvm.experimental.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %splice_nxv8i1_neg = call <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %splice_nxv4i1_neg = call <vscale x 4 x i1> @llvm.experimental.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %splice_nxv2i1_neg = call <vscale x 2 x i1> @llvm.experimental.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 -1)

				%splice_nxv16i8 = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
				%splice_nxv32i8 = call <vscale x 32 x i8> @llvm.experimental.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
				%splice_nxv2i16 = call <vscale x 2 x i16> @llvm.experimental.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
				%splice_nxv4i16 = call <vscale x 4 x i16> @llvm.experimental.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
				%splice_nxv8i16 = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
				%splice_nxv16i16 = call <vscale x 16 x i16> @llvm.experimental.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
				%splice_nxv4i32 = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
				%splice_nxv8i32 = call <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
				%splice_nxv2i64 = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
				%splice_nxv4i64 = call <vscale x 4 x i64> @llvm.experimental.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
				%splice_nxv2f16 = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
				%splice_nxv4f16 = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
				%splice_nxv8f16 = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
				%splice_nxv16f16 = call <vscale x 16 x half> @llvm.experimental.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
				%splice_nxv2f32 = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
				%splice_nxv4f32 = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
				%splice_nxv8f32 = call <vscale x 8 x float> @llvm.experimental.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
				%splice_nxv2f64 = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
				%splice_nxv4f64 = call <vscale x 4 x double> @llvm.experimental.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
				%splice_nxv2bf16 = call <vscale x 2 x bfloat> @llvm.experimental.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
				%splice_nxv4bf16 = call <vscale x 4 x bfloat> @llvm.experimental.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
				%splice_nxv8bf16 = call <vscale x 8 x bfloat> @llvm.experimental.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
				%splice_nxv16bf16 = call <vscale x 16 x bfloat> @llvm.experimental.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
				%splice_nxv16i1 = call <vscale x 16 x i1> @llvm.experimental.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
				%splice_nxv8i1 = call <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
				%splice_nxv4i1 = call <vscale x 4 x i1> @llvm.experimental.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
				%splice_nxv2i1 = call <vscale x 2 x i1> @llvm.experimental.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
				;; negative Index
				%splice_nxv16i8_neg = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 -1)
				%splice_nxv32i8_neg = call <vscale x 32 x i8> @llvm.experimental.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 -1)
				%splice_nxv2i16_neg = call <vscale x 2 x i16> @llvm.experimental.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 -1)
				%splice_nxv4i16_neg = call <vscale x 4 x i16> @llvm.experimental.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 -1)
				%splice_nxv8i16_neg = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 -1)
				%splice_nxv16i16_neg = call <vscale x 16 x i16> @llvm.experimental.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 -1)
				%splice_nxv4i32_neg = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 -1)
				%splice_nxv8i32_neg = call <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 -1)
				%splice_nxv2i64_neg= call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 -1)
				%splice_nxv4i64_neg = call <vscale x 4 x i64> @llvm.experimental.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 -1)
				%splice_nxv2f16_neg = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 -1)
				%splice_nxv4f16_neg = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 -1)
				%splice_nxv8f16_neg = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 -1)
				%splice_nxv16f16_neg = call <vscale x 16 x half> @llvm.experimental.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 -1)
				%splice_nxv2f32_neg = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 -1)
				%splice_nxv4f32_neg = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 -1)
				%splice_nxv8f32_neg = call <vscale x 8 x float> @llvm.experimental.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 -1)
				%splice_nxv2f64_neg = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 -1)
				%splice_nxv4f64_neg = call <vscale x 4 x double> @llvm.experimental.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 -1)
				%splice_nxv2bf16_neg = call <vscale x 2 x bfloat> @llvm.experimental.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 -1)
				%splice_nxv4bf16_neg = call <vscale x 4 x bfloat> @llvm.experimental.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 -1)
				%splice_nxv8bf16_neg = call <vscale x 8 x bfloat> @llvm.experimental.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 -1)
				%splice_nxv16bf16_neg = call <vscale x 16 x bfloat> @llvm.experimental.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 -1)
				%splice_nxv16i1_neg = call <vscale x 16 x i1> @llvm.experimental.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 -1)
				%splice_nxv8i1_neg = call <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 -1)
				%splice_nxv4i1_neg = call <vscale x 4 x i1> @llvm.experimental.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 -1)
				%splice_nxv2i1_neg = call <vscale x 2 x i1> @llvm.experimental.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 -1)
				ret void
				}

				declare <vscale x 2 x i1> @llvm.experimental.vector.splice.nxv2i1(<vscale x 2 x i1>, <vscale x 2 x i1>, i32)
				declare <vscale x 4 x i1> @llvm.experimental.vector.splice.nxv4i1(<vscale x 4 x i1>, <vscale x 4 x i1>, i32)
				declare <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1>, <vscale x 8 x i1>, i32)
				declare <vscale x 16 x i1> @llvm.experimental.vector.splice.nxv16i1(<vscale x 16 x i1>, <vscale x 16 x i1>, i32)
				declare <vscale x 2 x i8> @llvm.experimental.vector.splice.nxv2i8(<vscale x 2 x i8>, <vscale x 2 x i8>, i32)
				declare <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>, i32)
				declare <vscale x 32 x i8> @llvm.experimental.vector.splice.nxv32i8(<vscale x 32 x i8>, <vscale x 32 x i8>, i32)
				declare <vscale x 2 x i16> @llvm.experimental.vector.splice.nxv2i16(<vscale x 2 x i16>, <vscale x 2 x i16>, i32)
				declare <vscale x 4 x i16> @llvm.experimental.vector.splice.nxv4i16(<vscale x 4 x i16>, <vscale x 4 x i16>, i32)
				declare <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>, i32)
				declare <vscale x 16 x i16> @llvm.experimental.vector.splice.nxv16i16(<vscale x 16 x i16>, <vscale x 16 x i16>, i32)
				declare <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>, i32)
				declare <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32>, <vscale x 8 x i32>, i32)
				declare <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>, i32)
				declare <vscale x 4 x i64> @llvm.experimental.vector.splice.nxv4i64(<vscale x 4 x i64>, <vscale x 4 x i64>, i32)
				declare <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half>, <vscale x 2 x half>, i32)
				declare <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half>, <vscale x 4 x half>, i32)
				declare <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32)
				declare <vscale x 16 x half> @llvm.experimental.vector.splice.nxv16f16(<vscale x 16 x half>, <vscale x 16 x half>, i32)
				declare <vscale x 2 x bfloat> @llvm.experimental.vector.splice.nxv2bf16(<vscale x 2 x bfloat>, <vscale x 2 x bfloat>, i32)
				declare <vscale x 4 x bfloat> @llvm.experimental.vector.splice.nxv4bf16(<vscale x 4 x bfloat>, <vscale x 4 x bfloat>, i32)
				declare <vscale x 8 x bfloat> @llvm.experimental.vector.splice.nxv8bf16(<vscale x 8 x bfloat>, <vscale x 8 x bfloat>, i32)
				declare <vscale x 16 x bfloat> @llvm.experimental.vector.splice.nxv16bf16(<vscale x 16 x bfloat>, <vscale x 16 x bfloat>, i32)
				declare <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float>, <vscale x 2 x float>, i32)
				declare <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>, i32)
				declare <vscale x 8 x float> @llvm.experimental.vector.splice.nxv8f32(<vscale x 8 x float>, <vscale x 8 x float>, i32)
				declare <vscale x 16 x float> @llvm.experimental.vector.splice.nxv16f32(<vscale x 16 x float>, <vscale x 16 x float>, i32)
				declare <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>, i32)
				declare <vscale x 4 x double> @llvm.experimental.vector.splice.nxv4f64(<vscale x 4 x double>, <vscale x 4 x double>, i32)

	attributes #0 = { "target-features"="+sve,+bf16" }			attributes #0 = { "target-features"="+sve,+bf16" }