This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
CodeGen/
-
BasicTTIImpl.h
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
-
AArch64ISelLowering.cpp
-
AArch64TargetTransformInfo.h
21/25
AArch64TargetTransformInfo.cpp
-
test/Analysis/CostModel/AArch64/
-
Analysis/
-
CostModel/
-
AArch64/
2/2
splice.ll
-
sve-intrinsics.ll

Differential D104630

[AArch64][CostModel] Add cost model for experimental.vector.splice
ClosedPublic

Authored by CarolineConcatto on Jun 21 2021, 3:37 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
RKSimon
bsmith
ABataev
david-arm

Commits

rGa2c5c5605576: [AArch64][CostModel] Add cost model for experimental.vector.splice

Summary

This patch adds a new ShuffleKind SK_Splice and then handle the cost in
getShuffleCost.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

CarolineConcatto created this revision.Jun 21 2021, 3:37 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald TranscriptJun 21 2021, 3:37 AM

CarolineConcatto requested review of this revision.Jun 21 2021, 3:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2021, 3:37 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

The cost for fixed vector with splice could be improved by changing

InstructionCost getPermuteShuffleOverhead(FixedVectorType *VTy)

InstructionCost getPermuteShuffleOverhead(FixedVectorType *VTy, int Index)

and the loop to be until index instead of all elements.
But depends if it is fine to create a new shuffle SK::Splice for exeperimental.vector.splice.

CarolineConcatto edited the summary of this revision. (Show Details)Jun 21 2021, 3:50 AM

CarolineConcatto added reviewers: sdesmalen, RKSimon, bsmith, ABataev, david-arm.

Harbormaster completed remote builds in B110163: Diff 353323.Jun 21 2021, 6:19 AM

Matt added a subscriber: Matt.Jun 21 2021, 8:51 AM

sdesmalen added inline comments.Jun 22 2021, 1:43 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2041	Separate from the type, I think we'll need to distinguish the costs based on the value of the index as well. Given two scalable vectors <x0, x1, x2, x3> and <y0, y1, y2, y3>. For a positive offsets we can use SVE's EXT instruction. E.g. to splice at offset #1, the result of the splice will be <x1, x2, x3, y0>. For a negative offset, we can't use EXT but we can instead use SPLICE which requires (generating) a predicate. For a negative offset of 1 we need a predicate of: <0, 0, 0, 1>. This means the operation can be done using whilelt+not+splice, so for negative offsets it would be more expensive.
2042–2054	At the moment, the costs for these is actually quite high because they're expanded to two stores and one reload. That said, I'd prefer not to reflect that in the cost-model because this is not the desired code-gen and we should favour getting more scalable vectorization to get more testing coverage.
2055–2058	The predicates require two stores, a reload and an additional compare operation. Since predicates don't have a dedicated instruction, it should be fair to model the cost as that of two stores, a reload and a compare.
llvm/test/Analysis/CostModel/AArch64/splice.ll
35	nv?
60	odd spaces.

Any reason you didn't use the update_analyze_test_checks.py script?

Improve cost for scalable vector

@RKSimon I have changed the RUN line to be accepted by update_analyze_test_checks.py
I did not run the script in sve-intrinsics.ll file because. But the CHECK's for splice is generated by update_analyze_test_checks.py

@sdesmalen The cost now takes into account the index and it is different when the scalar type is i1.
For negative index there is predicate mask and a compare and select instruction to choose the correct elements.
That is the reason it uses getCmpSelInstrCost.
For predicated there is a table that has the cost for promoting and truncating together.

sdesmalen added inline comments.Jun 25 2021, 6:32 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1906	is it even needed to pass a Kind or Mask in the first place, they seem unused.
1907	This can be a switch statement instead? Also, how about giving the a name like `getPromotedTypeForPredicate` ?
1934	This could just use `getCastInstrCost` instead of the custom table?
1944	The compare is always an integer compare, i,e. `cmp ge <0, 1, 2, 3, ... N-1>, <idx, idx, idx, ... idx>`
1954	I think the cost has to find one, otherwise we have an unhandled/illegal type. So instead of `if`, this should have an `assert` that Entry != nullptr.

Harbormaster completed remote builds in B110974: Diff 354469.Jun 25 2021, 7:34 AM

Address Sander's comment

CarolineConcatto marked 2 inline comments as done.Jun 28 2021, 2:24 AM

CarolineConcatto added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1906	Hey Sander, You are correct about Mask, but Kind is needed.
1907	Compiler complains that MVN is not an integer for the switch.

Harbormaster completed remote builds in B111229: Diff 354811.Jun 28 2021, 4:32 AM

-Use switch to implement promote type
-Remove parameter Kind from getSpliceCost

Harbormaster completed remote builds in B111333: Diff 354966.Jun 28 2021, 2:16 PM

Thanks for the changes, this is looking better! Just left a few more nits.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1902	Can you move this out of the function into a separate `static MVT getPromotedTypeForPredicate` function? Perhaps we'll want to reuse this at a later point.
1938	If above you write: std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp); MVT PromotedVT = LT.second.getScalarType() == MVT::i1 ? getPromotedTypeForPredicate(LT.second) : LT.second; Then you can drop the IsPredicated and instead inline `PromotedVT.getScalarType() == MVT::i1` in the condition below.
1960	s/Ilegal/Illegal/
1962–1964	If LT.first is `unsigned`, the if-condition is redundant, you can write return LegalizationCost * LT.first; directly.

Create static MVT getPromotedTypeForPredicate function
Create a MVT PromotedVT

CarolineConcatto marked 2 inline comments as done.Jun 29 2021, 7:53 AM

CarolineConcatto added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1902	Is this what you were suggesting?

I spotted a few more things I missed in the last review, but I'm nearly happy.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1902	it was, thanks!
1918	nit: redundant whitespace.
1940	nit: `PromotedTy`
1944	I'm not sure if this matters, but for LegalVTy that's e.g. nxv16i8, the CondTy is nxv16i1, not LegalVTy.
1946	Should these two selects also be performed on `Promoted`?
1948	nit: add newline after this, and maybe add a one line comment saying that this implements the cost of the operation being performed on a promoted type.

Harbormaster completed remote builds in B111533: Diff 355241.Jun 29 2021, 10:39 AM

Replace the use of LegalVTy by PromotedVTy when computing the cost when Index<0
Add comments about promoted cost

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1946	I was considering use Promoted, but was not sure if it was correct. I've changed now to use the promoted type.

Harbormaster completed remote builds in B111755: Diff 355551.Jun 30 2021, 8:52 AM

david-arm added inline comments.Jul 1 2021, 1:17 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
907	nit: Whitespace
1902	Just FYI there is actually already a function in AArch64ISelLowering.cpp that does something very similar: static inline EVT getPromotedVTForPredicate(EVT VT) { assert(VT.isScalableVector() && (VT.getVectorElementType() == MVT::i1) && "Expected scalable predicate vector type!"); switch (VT.getVectorMinNumElements()) { default: llvm_unreachable("unexpected element count for vector"); case 2: return MVT::nxv2i64; case 4: return MVT::nxv4i32; case 8: return MVT::nxv8i16; case 16: return MVT::nxv16i8; } } I wonder if it's worth having a common routine in a header file?
1912	I think at the point we call this function the type has been legalised and split into LT.first (a multiple of a legal type) and LT.second (a legal type). So I think I'd expect the default case to be unreachable here perhaps?
1953	nit: Can you fix the formatting here please? Thanks!

Use getPromotedVTForPredicate from AArch64ISelLowering to compute the promoted type

Hey @david-arm
So I knew about the getPromotedVTForPredicate, but was not sure it was a good idea to use outside the class.
But as you suggested in the review, then I believe there is no problem in making it public.
I think it is best to have only one place to check the promoted type too.
But if not let me know and I can revert the change and apply your suggestion in the previous function.

Harbormaster completed remote builds in B112209: Diff 356188.Jul 2 2021, 11:50 AM

LGTM

This revision is now accepted and ready to land.Jul 5 2021, 4:00 AM

Fix format in line 1852: const auto *Entry = CostTableLookup

This revision was landed with ongoing or failed builds.Jul 5 2021, 6:30 AM

Closed by commit rGa2c5c5605576: [AArch64][CostModel] Add cost model for experimental.vector.splice (authored by CarolineConcatto). · Explain Why

This revision was automatically updated to reflect the committed changes.

CarolineConcatto added a commit: rGa2c5c5605576: [AArch64][CostModel] Add cost model for experimental.vector.splice.

Harbormaster completed remote builds in B112441: Diff 356494.Jul 5 2021, 6:57 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

5 lines

CodeGen/

BasicTTIImpl.h

8 lines

lib/

Target/

AArch64/

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

4 lines

AArch64TargetTransformInfo.h

2 lines

AArch64TargetTransformInfo.cpp

53 lines

test/

Analysis/

CostModel/

AArch64/

splice.ll

94 lines

sve-intrinsics.ll

148 lines

Diff 356497

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 852 Lines • ▼ Show 20 Lines	enum ShuffleKind {
SK_Select, ///< Selects elements from the corresponding lane of		SK_Select, ///< Selects elements from the corresponding lane of
///< either source operand. This is equivalent to a		///< either source operand. This is equivalent to a
///< vector select with a constant condition operand.		///< vector select with a constant condition operand.
SK_Transpose, ///< Transpose two vectors.		SK_Transpose, ///< Transpose two vectors.
SK_InsertSubvector, ///< InsertSubvector. Index indicates start offset.		SK_InsertSubvector, ///< InsertSubvector. Index indicates start offset.
SK_ExtractSubvector, ///< ExtractSubvector Index indicates start offset.		SK_ExtractSubvector, ///< ExtractSubvector Index indicates start offset.
SK_PermuteTwoSrc, ///< Merge elements from two source vectors into one		SK_PermuteTwoSrc, ///< Merge elements from two source vectors into one
///< with any shuffle mask.		///< with any shuffle mask.
SK_PermuteSingleSrc ///< Shuffle elements of single source vector with any		SK_PermuteSingleSrc, ///< Shuffle elements of single source vector with any
///< shuffle mask.		///< shuffle mask.
		SK_Splice ///< Concatenates elements from the first input vector
		///< with elements of the second input vector. Returning
		///< a vector of the same type as the input vectors.
};		};

/// Kind of the reduction data.		/// Kind of the reduction data.
enum ReductionKind {		enum ReductionKind {
RK_None, /// Not a reduction.		RK_None, /// Not a reduction.
RK_Arithmetic, /// Binary reduction data.		RK_Arithmetic, /// Binary reduction data.
RK_MinMax, /// Min/max reduction data.		RK_MinMax, /// Min/max reduction data.
RK_UnsignedMinMax, /// Unsigned min/max reduction data.		RK_UnsignedMinMax, /// Unsigned min/max reduction data.
▲ Show 20 Lines • Show All 1,534 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 819 Lines • ▼ Show 20 Lines	case TTI::SK_PermuteTwoSrc:
return TTI::SK_Transpose;		return TTI::SK_Transpose;
break;		break;
case TTI::SK_Select:		case TTI::SK_Select:
case TTI::SK_Reverse:		case TTI::SK_Reverse:
case TTI::SK_Broadcast:		case TTI::SK_Broadcast:
case TTI::SK_Transpose:		case TTI::SK_Transpose:
case TTI::SK_InsertSubvector:		case TTI::SK_InsertSubvector:
case TTI::SK_ExtractSubvector:		case TTI::SK_ExtractSubvector:
		case TTI::SK_Splice:
break;		break;
}		}
return Kind;		return Kind;
}		}

InstructionCost getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp,		InstructionCost getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp,
ArrayRef<int> Mask, int Index,		ArrayRef<int> Mask, int Index,
VectorType *SubTp) {		VectorType *SubTp) {

switch (improveShuffleKindFromMask(Kind, Mask)) {		switch (improveShuffleKindFromMask(Kind, Mask)) {
case TTI::SK_Broadcast:		case TTI::SK_Broadcast:
return getBroadcastShuffleOverhead(cast<FixedVectorType>(Tp));		return getBroadcastShuffleOverhead(cast<FixedVectorType>(Tp));
case TTI::SK_Select:		case TTI::SK_Select:
		case TTI::SK_Splice:
case TTI::SK_Reverse:		case TTI::SK_Reverse:
case TTI::SK_Transpose:		case TTI::SK_Transpose:
case TTI::SK_PermuteSingleSrc:		case TTI::SK_PermuteSingleSrc:
case TTI::SK_PermuteTwoSrc:		case TTI::SK_PermuteTwoSrc:
return getPermuteShuffleOverhead(cast<FixedVectorType>(Tp));		return getPermuteShuffleOverhead(cast<FixedVectorType>(Tp));
case TTI::SK_ExtractSubvector:		case TTI::SK_ExtractSubvector:
return getExtractSubvectorOverhead(Tp, Index,		return getExtractSubvectorOverhead(Tp, Index,
cast<FixedVectorType>(SubTp));		cast<FixedVectorType>(SubTp));
▲ Show 20 Lines • Show All 517 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_vector_insert: {
TTI::SK_InsertSubvector, cast<VectorType>(Args[0]->getType()), None,		TTI::SK_InsertSubvector, cast<VectorType>(Args[0]->getType()), None,
Index, cast<VectorType>(Args[1]->getType()));		Index, cast<VectorType>(Args[1]->getType()));
}		}
case Intrinsic::experimental_vector_reverse: {		case Intrinsic::experimental_vector_reverse: {
return thisT()->getShuffleCost(TTI::SK_Reverse,		return thisT()->getShuffleCost(TTI::SK_Reverse,
cast<VectorType>(Args[0]->getType()), None,		cast<VectorType>(Args[0]->getType()), None,
0, cast<VectorType>(RetTy));		0, cast<VectorType>(RetTy));
}		}
		case Intrinsic::experimental_vector_splice: {
		unsigned Index = cast<ConstantInt>(Args[2])->getZExtValue();
		return thisT()->getShuffleCost(TTI::SK_Splice,
		cast<VectorType>(Args[0]->getType()), None,
		Index, cast<VectorType>(RetTy));
		}
case Intrinsic::vector_reduce_add:		case Intrinsic::vector_reduce_add:
case Intrinsic::vector_reduce_mul:		case Intrinsic::vector_reduce_mul:
case Intrinsic::vector_reduce_and:		case Intrinsic::vector_reduce_and:
case Intrinsic::vector_reduce_or:		case Intrinsic::vector_reduce_or:
case Intrinsic::vector_reduce_xor:		case Intrinsic::vector_reduce_xor:
case Intrinsic::vector_reduce_smax:		case Intrinsic::vector_reduce_smax:
case Intrinsic::vector_reduce_smin:		case Intrinsic::vector_reduce_smin:
case Intrinsic::vector_reduce_fmax:		case Intrinsic::vector_reduce_fmax:
▲ Show 20 Lines • Show All 801 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 816 Lines • ▼ Show 20 Lines	public:
// If the platform/function should have a redzone, return the size in bytes.		// If the platform/function should have a redzone, return the size in bytes.
unsigned getRedZoneSize(const Function &F) const {		unsigned getRedZoneSize(const Function &F) const {
if (F.hasFnAttribute(Attribute::NoRedZone))		if (F.hasFnAttribute(Attribute::NoRedZone))
return 0;		return 0;
return 128;		return 128;
}		}

bool isAllActivePredicate(SDValue N) const;		bool isAllActivePredicate(SDValue N) const;
		EVT getPromotedVTForPredicate(EVT VT) const;

private:		private:
/// Keep a pointer to the AArch64Subtarget around so that we can		/// Keep a pointer to the AArch64Subtarget around so that we can
/// make the right decision when generating code for different targets.		/// make the right decision when generating code for different targets.
const AArch64Subtarget *Subtarget;		const AArch64Subtarget *Subtarget;

bool isExtFreeImpl(const Instruction *Ext) const override;		bool isExtFreeImpl(const Instruction *Ext) const override;

▲ Show 20 Lines • Show All 286 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 18,502 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::getSVESafeBitCast(EVT VT, SDValue Op,

return Op;		return Op;
}		}

bool AArch64TargetLowering::isAllActivePredicate(SDValue N) const {		bool AArch64TargetLowering::isAllActivePredicate(SDValue N) const {
return ::isAllActivePredicate(N);		return ::isAllActivePredicate(N);
}		}

		EVT AArch64TargetLowering::getPromotedVTForPredicate(EVT VT) const {
		return ::getPromotedVTForPredicate(VT);
		}

bool AArch64TargetLowering::SimplifyDemandedBitsForTargetNode(		bool AArch64TargetLowering::SimplifyDemandedBitsForTargetNode(
SDValue Op, const APInt &OriginalDemandedBits,		SDValue Op, const APInt &OriginalDemandedBits,
const APInt &OriginalDemandedElts, KnownBits &Known, TargetLoweringOpt &TLO,		const APInt &OriginalDemandedElts, KnownBits &Known, TargetLoweringOpt &TLO,
unsigned Depth) const {		unsigned Depth) const {

unsigned Opc = Op.getOpcode();		unsigned Opc = Op.getOpcode();
switch (Opc) {		switch (Opc) {
case AArch64ISD::VSHL: {		case AArch64ISD::VSHL: {
Show All 40 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	InstructionCost getMinMaxReductionCost(VectorType Ty, VectorType CondTy,
bool IsPairwise, bool IsUnsigned,		bool IsPairwise, bool IsUnsigned,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getArithmeticReductionCostSVE(unsigned Opcode,		InstructionCost getArithmeticReductionCostSVE(unsigned Opcode,
VectorType *ValTy,		VectorType *ValTy,
bool IsPairwiseForm,		bool IsPairwiseForm,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

		InstructionCost getSpliceCost(VectorType *Tp, int Index);

InstructionCost getArithmeticInstrCost(		InstructionCost getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >(),		ArrayRef<const Value > Args = ArrayRef<const Value >(),
▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 833 Lines • ▼ Show 20 Lines	ConversionTbl[] = {
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i32, 1 },		{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i32, 1 },
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i64, 1 },		{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i64, 1 },
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i16, 1 },		{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i16, 1 },
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i32, 1 },		{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i32, 1 },
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i64, 2 },		{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i64, 2 },
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i16, 1 },		{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i16, 1 },
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i32, 3 },		{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i32, 3 },
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i64, 5 },		{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i64, 5 },
		{ ISD::TRUNCATE, MVT::nxv16i1, MVT::nxv16i8, 1 },
{ ISD::TRUNCATE, MVT::nxv2i16, MVT::nxv2i32, 1 },		{ ISD::TRUNCATE, MVT::nxv2i16, MVT::nxv2i32, 1 },
{ ISD::TRUNCATE, MVT::nxv2i32, MVT::nxv2i64, 1 },		{ ISD::TRUNCATE, MVT::nxv2i32, MVT::nxv2i64, 1 },
{ ISD::TRUNCATE, MVT::nxv4i16, MVT::nxv4i32, 1 },		{ ISD::TRUNCATE, MVT::nxv4i16, MVT::nxv4i32, 1 },
{ ISD::TRUNCATE, MVT::nxv4i32, MVT::nxv4i64, 2 },		{ ISD::TRUNCATE, MVT::nxv4i32, MVT::nxv4i64, 2 },
{ ISD::TRUNCATE, MVT::nxv8i16, MVT::nxv8i32, 3 },		{ ISD::TRUNCATE, MVT::nxv8i16, MVT::nxv8i32, 3 },
{ ISD::TRUNCATE, MVT::nxv8i32, MVT::nxv8i64, 6 },		{ ISD::TRUNCATE, MVT::nxv8i32, MVT::nxv8i64, 6 },

// The number of shll instructions for the extension.		// The number of shll instructions for the extension.
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	ConversionTbl[] = {
// Complex: to v2f64		// Complex: to v2f64
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i8, 4 },		{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i8, 4 },
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i16, 4 },		{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i16, 4 },
{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i32, 2 },		{ ISD::SINT_TO_FP, MVT::v2f64, MVT::v2i32, 2 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i8, 4 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i8, 4 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i16, 4 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i16, 4 },
{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 2 },		{ ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i32, 2 },


david-armUnsubmitted Done Reply Inline Actions nit: Whitespace david-arm: nit: Whitespace
// LowerVectorFP_TO_INT		// LowerVectorFP_TO_INT
{ ISD::FP_TO_SINT, MVT::v2i32, MVT::v2f32, 1 },		{ ISD::FP_TO_SINT, MVT::v2i32, MVT::v2f32, 1 },
{ ISD::FP_TO_SINT, MVT::v4i32, MVT::v4f32, 1 },		{ ISD::FP_TO_SINT, MVT::v4i32, MVT::v4f32, 1 },
{ ISD::FP_TO_SINT, MVT::v2i64, MVT::v2f64, 1 },		{ ISD::FP_TO_SINT, MVT::v2i64, MVT::v2f64, 1 },
{ ISD::FP_TO_UINT, MVT::v2i32, MVT::v2f32, 1 },		{ ISD::FP_TO_UINT, MVT::v2i32, MVT::v2f32, 1 },
{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 1 },		{ ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f32, 1 },
{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f64, 1 },		{ ISD::FP_TO_UINT, MVT::v2i64, MVT::v2f64, 1 },

▲ Show 20 Lines • Show All 976 Lines • ▼ Show 20 Lines	if (!ValVTy->getElementType()->isIntegerTy(1) &&
return Entry->Cost + ExtraCost;		return Entry->Cost + ExtraCost;
}		}
break;		break;
}		}
return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwiseForm,		return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwiseForm,
CostKind);		CostKind);
}		}

		InstructionCost AArch64TTIImpl::getSpliceCost(VectorType *Tp, int Index) {
		static const CostTblEntry ShuffleTbl[] = {
		sdesmalenUnsubmitted Done Reply Inline Actions Can you move this out of the function into a separate `static MVT getPromotedTypeForPredicate` function? Perhaps we'll want to reuse this at a later point. sdesmalen: Can you move this out of the function into a separate `static MVT getPromotedTypeForPredicate`…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Is this what you were suggesting? CarolineConcatto: Is this what you were suggesting?
		sdesmalenUnsubmitted Not Done Reply Inline Actions it was, thanks! sdesmalen: it was, thanks!
		david-armUnsubmitted Not Done Reply Inline Actions Just FYI there is actually already a function in AArch64ISelLowering.cpp that does something very similar: static inline EVT getPromotedVTForPredicate(EVT VT) { assert(VT.isScalableVector() && (VT.getVectorElementType() == MVT::i1) && "Expected scalable predicate vector type!"); switch (VT.getVectorMinNumElements()) { default: llvm_unreachable("unexpected element count for vector"); case 2: return MVT::nxv2i64; case 4: return MVT::nxv4i32; case 8: return MVT::nxv8i16; case 16: return MVT::nxv16i8; } } I wonder if it's worth having a common routine in a header file? david-arm: Just FYI there is actually already a function in AArch64ISelLowering.cpp that does something…
		{ TTI::SK_Splice, MVT::nxv16i8, 1 },
		{ TTI::SK_Splice, MVT::nxv8i16, 1 },
		{ TTI::SK_Splice, MVT::nxv4i32, 1 },
		{ TTI::SK_Splice, MVT::nxv2i64, 1 },
		sdesmalenUnsubmitted Done Reply Inline Actions is it even needed to pass a Kind or Mask in the first place, they seem unused. sdesmalen: is it even needed to pass a Kind or Mask in the first place, they seem unused.
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Hey Sander, You are correct about Mask, but Kind is needed. CarolineConcatto: Hey Sander, You are correct about Mask, but Kind is needed.
		{ TTI::SK_Splice, MVT::nxv2f16, 1 },
		sdesmalenUnsubmitted Done Reply Inline Actions This can be a switch statement instead? Also, how about giving the a name like `getPromotedTypeForPredicate` ? sdesmalen: This can be a switch statement instead? Also, how about giving the a name like…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Compiler complains that MVN is not an integer for the switch. CarolineConcatto: Compiler complains that MVN is not an integer for the switch.
		{ TTI::SK_Splice, MVT::nxv4f16, 1 },
		{ TTI::SK_Splice, MVT::nxv8f16, 1 },
		{ TTI::SK_Splice, MVT::nxv2bf16, 1 },
		{ TTI::SK_Splice, MVT::nxv4bf16, 1 },
		{ TTI::SK_Splice, MVT::nxv8bf16, 1 },
		david-armUnsubmitted Done Reply Inline Actions I think at the point we call this function the type has been legalised and split into LT.first (a multiple of a legal type) and LT.second (a legal type). So I think I'd expect the default case to be unreachable here perhaps? david-arm: I think at the point we call this function the type has been legalised and split into LT.first…
		{ TTI::SK_Splice, MVT::nxv2f32, 1 },
		{ TTI::SK_Splice, MVT::nxv4f32, 1 },
		{ TTI::SK_Splice, MVT::nxv2f64, 1 },
		};

		std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
		sdesmalenUnsubmitted Done Reply Inline Actions nit: redundant whitespace. sdesmalen: nit: redundant whitespace.
		Type *LegalVTy = EVT(LT.second).getTypeForEVT(Tp->getContext());
		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
		EVT PromotedVT = LT.second.getScalarType() == MVT::i1
		? TLI->getPromotedVTForPredicate(EVT(LT.second))
		: LT.second;
		Type *PromotedVTy = EVT(PromotedVT).getTypeForEVT(Tp->getContext());
		InstructionCost LegalizationCost = 0;
		if (Index < 0) {
		LegalizationCost =
		getCmpSelInstrCost(Instruction::ICmp, PromotedVTy, PromotedVTy,
		CmpInst::BAD_ICMP_PREDICATE, CostKind) +
		getCmpSelInstrCost(Instruction::Select, PromotedVTy, LegalVTy,
		CmpInst::BAD_ICMP_PREDICATE, CostKind);
		}

		// Predicated splice are promoted when lowering. See AArch64ISelLowering.cpp
		sdesmalenUnsubmitted Done Reply Inline Actions This could just use `getCastInstrCost` instead of the custom table? sdesmalen: This could just use `getCastInstrCost` instead of the custom table?
		// Cost performed on a promoted type.
		if (LT.second.getScalarType() == MVT::i1) {
		LegalizationCost +=
		getCastInstrCost(Instruction::ZExt, PromotedVTy, LegalVTy,
		sdesmalenUnsubmitted Done Reply Inline Actions If above you write: std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp); MVT PromotedVT = LT.second.getScalarType() == MVT::i1 ? getPromotedTypeForPredicate(LT.second) : LT.second; Then you can drop the IsPredicated and instead inline `PromotedVT.getScalarType() == MVT::i1` in the condition below. sdesmalen: If above you write: std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL…
		TTI::CastContextHint::None, CostKind) +
		getCastInstrCost(Instruction::Trunc, LegalVTy, PromotedVTy,
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `PromotedTy` sdesmalen: nit: `PromotedTy`
		TTI::CastContextHint::None, CostKind);
		}
		const auto *Entry =
		CostTableLookup(ShuffleTbl, TTI::SK_Splice, PromotedVT.getSimpleVT());
		sdesmalenUnsubmitted Done Reply Inline Actions The compare is always an integer compare, i,e. `cmp ge <0, 1, 2, 3, ... N-1>, <idx, idx, idx, ... idx>` sdesmalen: The compare is always an integer compare, i,e. `cmp ge <0, 1, 2, 3, ... N-1>, <idx, idx, idx, .
		sdesmalenUnsubmitted Done Reply Inline Actions I'm not sure if this matters, but for LegalVTy that's e.g. nxv16i8, the CondTy is nxv16i1, not LegalVTy. sdesmalen: I'm not sure if this matters, but for LegalVTy that's e.g. nxv16i8, the CondTy is nxv16i1, not…
		assert(Entry && "Illegal Type for Splice");
		LegalizationCost += Entry->Cost;
		sdesmalenUnsubmitted Done Reply Inline Actions Should these two selects also be performed on `Promoted`? sdesmalen: Should these two selects also be performed on `Promoted`?
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions I was considering use Promoted, but was not sure if it was correct. I've changed now to use the promoted type. CarolineConcatto: I was considering use Promoted, but was not sure if it was correct. I've changed now to use the…
		return LegalizationCost * LT.first;
		}
		sdesmalenUnsubmitted Done Reply Inline Actions nit: add newline after this, and maybe add a one line comment saying that this implements the cost of the operation being performed on a promoted type. sdesmalen: nit: add newline after this, and maybe add a one line comment saying that this implements the…

InstructionCost AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind Kind,		InstructionCost AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
VectorType *Tp,		VectorType *Tp,
ArrayRef<int> Mask, int Index,		ArrayRef<int> Mask, int Index,
VectorType *SubTp) {		VectorType *SubTp) {
		david-armUnsubmitted Done Reply Inline Actions nit: Can you fix the formatting here please? Thanks! david-arm: nit: Can you fix the formatting here please? Thanks!
Kind = improveShuffleKindFromMask(Kind, Mask);		Kind = improveShuffleKindFromMask(Kind, Mask);
		sdesmalenUnsubmitted Done Reply Inline Actions I think the cost has to find one, otherwise we have an unhandled/illegal type. So instead of `if`, this should have an `assert` that Entry != nullptr. sdesmalen: I think the cost has to find one, otherwise we have an unhandled/illegal type. So instead of…
if (Kind == TTI::SK_Broadcast \|\| Kind == TTI::SK_Transpose \|\|		if (Kind == TTI::SK_Broadcast \|\| Kind == TTI::SK_Transpose \|\|
Kind == TTI::SK_Select \|\| Kind == TTI::SK_PermuteSingleSrc \|\|		Kind == TTI::SK_Select \|\| Kind == TTI::SK_PermuteSingleSrc \|\|
Kind == TTI::SK_Reverse) {		Kind == TTI::SK_Reverse) {
static const CostTblEntry ShuffleTbl[] = {		static const CostTblEntry ShuffleTbl[] = {
// Broadcast shuffle kinds can be performed with 'dup'.		// Broadcast shuffle kinds can be performed with 'dup'.
{ TTI::SK_Broadcast, MVT::v8i8, 1 },		{ TTI::SK_Broadcast, MVT::v8i8, 1 },
		sdesmalenUnsubmitted Done Reply Inline Actions s/Ilegal/Illegal/ sdesmalen: s/Ilegal/Illegal/
{ TTI::SK_Broadcast, MVT::v16i8, 1 },		{ TTI::SK_Broadcast, MVT::v16i8, 1 },
{ TTI::SK_Broadcast, MVT::v4i16, 1 },		{ TTI::SK_Broadcast, MVT::v4i16, 1 },
{ TTI::SK_Broadcast, MVT::v8i16, 1 },		{ TTI::SK_Broadcast, MVT::v8i16, 1 },
{ TTI::SK_Broadcast, MVT::v2i32, 1 },		{ TTI::SK_Broadcast, MVT::v2i32, 1 },
		sdesmalenUnsubmitted Done Reply Inline Actions If LT.first is `unsigned`, the if-condition is redundant, you can write return LegalizationCost * LT.first; directly. sdesmalen: If LT.first is `unsigned`, the if-condition is redundant, you can write return…
{ TTI::SK_Broadcast, MVT::v4i32, 1 },		{ TTI::SK_Broadcast, MVT::v4i32, 1 },
{ TTI::SK_Broadcast, MVT::v2i64, 1 },		{ TTI::SK_Broadcast, MVT::v2i64, 1 },
{ TTI::SK_Broadcast, MVT::v2f32, 1 },		{ TTI::SK_Broadcast, MVT::v2f32, 1 },
{ TTI::SK_Broadcast, MVT::v4f32, 1 },		{ TTI::SK_Broadcast, MVT::v4f32, 1 },
{ TTI::SK_Broadcast, MVT::v2f64, 1 },		{ TTI::SK_Broadcast, MVT::v2f64, 1 },
// Transpose shuffle kinds can be performed with 'trn1/trn2' and		// Transpose shuffle kinds can be performed with 'trn1/trn2' and
// 'zip1/zip2' instructions.		// 'zip1/zip2' instructions.
{ TTI::SK_Transpose, MVT::v8i8, 1 },		{ TTI::SK_Transpose, MVT::v8i8, 1 },
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	static const CostTblEntry ShuffleTbl[] = {
{ TTI::SK_Reverse, MVT::nxv8bf16, 1 },		{ TTI::SK_Reverse, MVT::nxv8bf16, 1 },
{ TTI::SK_Reverse, MVT::nxv2f32, 1 },		{ TTI::SK_Reverse, MVT::nxv2f32, 1 },
{ TTI::SK_Reverse, MVT::nxv4f32, 1 },		{ TTI::SK_Reverse, MVT::nxv4f32, 1 },
{ TTI::SK_Reverse, MVT::nxv2f64, 1 },		{ TTI::SK_Reverse, MVT::nxv2f64, 1 },
{ TTI::SK_Reverse, MVT::nxv16i1, 1 },		{ TTI::SK_Reverse, MVT::nxv16i1, 1 },
{ TTI::SK_Reverse, MVT::nxv8i1, 1 },		{ TTI::SK_Reverse, MVT::nxv8i1, 1 },
{ TTI::SK_Reverse, MVT::nxv4i1, 1 },		{ TTI::SK_Reverse, MVT::nxv4i1, 1 },
{ TTI::SK_Reverse, MVT::nxv2i1, 1 },		{ TTI::SK_Reverse, MVT::nxv2i1, 1 },
};		};
		sdesmalenUnsubmitted Not Done Reply Inline Actions Separate from the type, I think we'll need to distinguish the costs based on the value of the index as well. Given two scalable vectors <x0, x1, x2, x3> and <y0, y1, y2, y3>. For a positive offsets we can use SVE's EXT instruction. E.g. to splice at offset #1, the result of the splice will be <x1, x2, x3, y0>. For a negative offset, we can't use EXT but we can instead use SPLICE which requires (generating) a predicate. For a negative offset of 1 we need a predicate of: <0, 0, 0, 1>. This means the operation can be done using whilelt+not+splice, so for negative offsets it would be more expensive. sdesmalen: Separate from the type, I think we'll need to distinguish the costs based on the value of the…
std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);		std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
if (const auto *Entry = CostTableLookup(ShuffleTbl, Kind, LT.second))		if (const auto *Entry = CostTableLookup(ShuffleTbl, Kind, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;
}		}
		if (Kind == TTI::SK_Splice && isa<ScalableVectorType>(Tp))
		return getSpliceCost(Tp, Index);
return BaseT::getShuffleCost(Kind, Tp, Mask, Index, SubTp);		return BaseT::getShuffleCost(Kind, Tp, Mask, Index, SubTp);
}		}
		sdesmalenUnsubmitted Not Done Reply Inline Actions The predicates require two stores, a reload and an additional compare operation. Since predicates don't have a dedicated instruction, it should be fair to model the cost as that of two stores, a reload and a compare. sdesmalen: The predicates require two stores, a reload and an additional compare operation. Since…

llvm/test/Analysis/CostModel/AArch64/splice.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; RUN: opt < %s -analyze -cost-model -S -mtriple=aarch64--linux-gnu \| FileCheck %s

				define void @vector_splice() #0 {
				; CHECK-LABEL: 'vector_splice'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 90 for instruction: %splice.v16i8 = call <16 x i8> @llvm.experimental.vector.splice.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 180 for instruction: %splice.v32i8 = call <32 x i8> @llvm.experimental.vector.splice.v32i8(<32 x i8> zeroinitializer, <32 x i8> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2i16 = call <2 x i16> @llvm.experimental.vector.splice.v2i16(<2 x i16> zeroinitializer, <2 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %splice.v4i16 = call <4 x i16> @llvm.experimental.vector.splice.v4i16(<4 x i16> zeroinitializer, <4 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %splice.v8i16 = call <8 x i16> @llvm.experimental.vector.splice.v8i16(<8 x i16> zeroinitializer, <8 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 84 for instruction: %splice.v16i16 = call <16 x i16> @llvm.experimental.vector.splice.v16i16(<16 x i16> zeroinitializer, <16 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %splice.v4i32 = call <4 x i32> @llvm.experimental.vector.splice.v4i32(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %splice.v8i32 = call <8 x i32> @llvm.experimental.vector.splice.v8i32(<8 x i32> zeroinitializer, <8 x i32> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2i64 = call <2 x i64> @llvm.experimental.vector.splice.v2i64(<2 x i64> zeroinitializer, <2 x i64> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %splice.v4i64 = call <4 x i64> @llvm.experimental.vector.splice.v4i64(<4 x i64> zeroinitializer, <4 x i64> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2f16 = call <2 x half> @llvm.experimental.vector.splice.v2f16(<2 x half> zeroinitializer, <2 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %splice.v4f16 = call <4 x half> @llvm.experimental.vector.splice.v4f16(<4 x half> zeroinitializer, <4 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %splice.v8f16 = call <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half> zeroinitializer, <8 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 84 for instruction: %splice.v16f16 = call <16 x half> @llvm.experimental.vector.splice.v16f16(<16 x half> zeroinitializer, <16 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2f32 = call <2 x float> @llvm.experimental.vector.splice.v2f32(<2 x float> zeroinitializer, <2 x float> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %splice.v4f32 = call <4 x float> @llvm.experimental.vector.splice.v4f32(<4 x float> zeroinitializer, <4 x float> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %splice.v8f32 = call <8 x float> @llvm.experimental.vector.splice.v8f32(<8 x float> zeroinitializer, <8 x float> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2f64 = call <2 x double> @llvm.experimental.vector.splice.v2f64(<2 x double> zeroinitializer, <2 x double> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %splice.v4f64 = call <4 x double> @llvm.experimental.vector.splice.v4f64(<4 x double> zeroinitializer, <4 x double> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2bf16 = call <2 x bfloat> @llvm.experimental.vector.splice.v2bf16(<2 x bfloat> zeroinitializer, <2 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %splice.v4bf16 = call <4 x bfloat> @llvm.experimental.vector.splice.v4bf16(<4 x bfloat> zeroinitializer, <4 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %splice.v8bf16 = call <8 x bfloat> @llvm.experimental.vector.splice.v8bf16(<8 x bfloat> zeroinitializer, <8 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 84 for instruction: %splice.v16bf16 = call <16 x bfloat> @llvm.experimental.vector.splice.v16bf16(<16 x bfloat> zeroinitializer, <16 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 90 for instruction: %splice.v16i1 = call <16 x i1> @llvm.experimental.vector.splice.v16i1(<16 x i1> zeroinitializer, <16 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %splice.v8i1 = call <8 x i1> @llvm.experimental.vector.splice.v8i1(<8 x i1> zeroinitializer, <8 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %splice.v4i1 = call <4 x i1> @llvm.experimental.vector.splice.v4i1(<4 x i1> zeroinitializer, <4 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice.v2i1 = call <2 x i1> @llvm.experimental.vector.splice.v2i1(<2 x i1> zeroinitializer, <2 x i1> zeroinitializer, i32 1)
				;
				%splice.v16i8 = call <16 x i8> @llvm.experimental.vector.splice.v16i8(<16 x i8> zeroinitializer, <16 x i8> zeroinitializer, i32 1)
				%splice.v32i8 = call <32 x i8> @llvm.experimental.vector.splice.v32i8(<32 x i8> zeroinitializer, <32 x i8> zeroinitializer, i32 1)
				sdesmalenUnsubmitted Done Reply Inline Actions nv? sdesmalen: nv?
				%splice.v2i16 = call <2 x i16> @llvm.experimental.vector.splice.v2i16(<2 x i16> zeroinitializer, <2 x i16> zeroinitializer, i32 1)
				%splice.v4i16 = call <4 x i16> @llvm.experimental.vector.splice.v4i16(<4 x i16> zeroinitializer, <4 x i16> zeroinitializer, i32 1)
				%splice.v8i16 = call <8 x i16> @llvm.experimental.vector.splice.v8i16(<8 x i16> zeroinitializer, <8 x i16> zeroinitializer, i32 1)
				%splice.v16i16 = call <16 x i16> @llvm.experimental.vector.splice.v16i16(<16 x i16> zeroinitializer, <16 x i16> zeroinitializer, i32 1)
				%splice.v4i32 = call <4 x i32> @llvm.experimental.vector.splice.v4i32(<4 x i32> zeroinitializer, <4 x i32> zeroinitializer, i32 1)
				%splice.v8i32 = call <8 x i32> @llvm.experimental.vector.splice.v8i32(<8 x i32> zeroinitializer, <8 x i32> zeroinitializer, i32 1)
				%splice.v2i64 = call <2 x i64> @llvm.experimental.vector.splice.v2i64(<2 x i64> zeroinitializer, <2 x i64> zeroinitializer, i32 1)
				%splice.v4i64 = call <4 x i64> @llvm.experimental.vector.splice.v4i64(<4 x i64> zeroinitializer, <4 x i64> zeroinitializer, i32 1)
				%splice.v2f16 = call <2 x half> @llvm.experimental.vector.splice.v2f16(<2 x half> zeroinitializer, <2 x half> zeroinitializer, i32 1)
				%splice.v4f16 = call <4 x half> @llvm.experimental.vector.splice.v4f16(<4 x half> zeroinitializer, <4 x half> zeroinitializer, i32 1)
				%splice.v8f16 = call <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half> zeroinitializer, <8 x half> zeroinitializer, i32 1)
				%splice.v16f16 = call <16 x half> @llvm.experimental.vector.splice.v16f16(<16 x half> zeroinitializer, <16 x half> zeroinitializer, i32 1)
				%splice.v2f32 = call <2 x float> @llvm.experimental.vector.splice.v2f32(<2 x float> zeroinitializer, <2 x float> zeroinitializer, i32 1)
				%splice.v4f32 = call <4 x float> @llvm.experimental.vector.splice.v4f32(<4 x float> zeroinitializer, <4 x float> zeroinitializer, i32 1)
				%splice.v8f32 = call <8 x float> @llvm.experimental.vector.splice.v8f32(<8 x float> zeroinitializer, <8 x float> zeroinitializer, i32 1)
				%splice.v2f64 = call <2 x double> @llvm.experimental.vector.splice.v2f64(<2 x double> zeroinitializer, <2 x double> zeroinitializer, i32 1)
				%splice.v4f64 = call <4 x double> @llvm.experimental.vector.splice.v4f64(<4 x double> zeroinitializer, <4 x double> zeroinitializer, i32 1)
				%splice.v2bf16 = call <2 x bfloat> @llvm.experimental.vector.splice.v2bf16(<2 x bfloat> zeroinitializer, <2 x bfloat> zeroinitializer, i32 1)
				%splice.v4bf16 = call <4 x bfloat> @llvm.experimental.vector.splice.v4bf16(<4 x bfloat> zeroinitializer, <4 x bfloat> zeroinitializer, i32 1)
				%splice.v8bf16 = call <8 x bfloat> @llvm.experimental.vector.splice.v8bf16(<8 x bfloat> zeroinitializer, <8 x bfloat> zeroinitializer, i32 1)
				%splice.v16bf16 = call <16 x bfloat> @llvm.experimental.vector.splice.v16bf16(<16 x bfloat> zeroinitializer, <16 x bfloat> zeroinitializer, i32 1)
				%splice.v16i1 = call <16 x i1> @llvm.experimental.vector.splice.v16i1(<16 x i1> zeroinitializer, <16 x i1> zeroinitializer, i32 1)
				%splice.v8i1 = call <8 x i1> @llvm.experimental.vector.splice.v8i1(<8 x i1> zeroinitializer, <8 x i1> zeroinitializer, i32 1)
				%splice.v4i1 = call <4 x i1> @llvm.experimental.vector.splice.v4i1(<4 x i1> zeroinitializer, <4 x i1> zeroinitializer, i32 1)
				%splice.v2i1 = call <2 x i1> @llvm.experimental.vector.splice.v2i1(<2 x i1> zeroinitializer, <2 x i1> zeroinitializer, i32 1)
				sdesmalenUnsubmitted Done Reply Inline Actions odd spaces. sdesmalen: odd spaces.
				ret void
				}

				declare <2 x i1> @llvm.experimental.vector.splice.v2i1(<2 x i1>, <2 x i1>, i32)
				declare <4 x i1> @llvm.experimental.vector.splice.v4i1(<4 x i1>, <4 x i1>, i32)
				declare <8 x i1> @llvm.experimental.vector.splice.v8i1(<8 x i1>, <8 x i1>, i32)
				declare <16 x i1> @llvm.experimental.vector.splice.v16i1(<16 x i1>, <16 x i1>, i32)
				declare <2 x i8> @llvm.experimental.vector.splice.v2i8(<2 x i8>, <2 x i8>, i32)
				declare <16 x i8> @llvm.experimental.vector.splice.v16i8(<16 x i8>, <16 x i8>, i32)
				declare <32 x i8> @llvm.experimental.vector.splice.v32i8(<32 x i8>, <32 x i8>, i32)
				declare <2 x i16> @llvm.experimental.vector.splice.v2i16(<2 x i16>, <2 x i16>, i32)
				declare <4 x i16> @llvm.experimental.vector.splice.v4i16(<4 x i16>, <4 x i16>, i32)
				declare <8 x i16> @llvm.experimental.vector.splice.v8i16(<8 x i16>, <8 x i16>, i32)
				declare <16 x i16> @llvm.experimental.vector.splice.v16i16(<16 x i16>, <16 x i16>, i32)
				declare <4 x i32> @llvm.experimental.vector.splice.v4i32(<4 x i32>, <4 x i32>, i32)
				declare <8 x i32> @llvm.experimental.vector.splice.v8i32(<8 x i32>, <8 x i32>, i32)
				declare <2 x i64> @llvm.experimental.vector.splice.v2i64(<2 x i64>, <2 x i64>, i32)
				declare <4 x i64> @llvm.experimental.vector.splice.v4i64(<4 x i64>, <4 x i64>, i32)
				declare <2 x half> @llvm.experimental.vector.splice.v2f16(<2 x half>, <2 x half>, i32)
				declare <4 x half> @llvm.experimental.vector.splice.v4f16(<4 x half>, <4 x half>, i32)
				declare <8 x half> @llvm.experimental.vector.splice.v8f16(<8 x half>, <8 x half>, i32)
				declare <16 x half> @llvm.experimental.vector.splice.v16f16(<16 x half>, <16 x half>, i32)
				declare <2 x bfloat> @llvm.experimental.vector.splice.v2bf16(<2 x bfloat>, <2 x bfloat>, i32)
				declare <4 x bfloat> @llvm.experimental.vector.splice.v4bf16(<4 x bfloat>, <4 x bfloat>, i32)
				declare <8 x bfloat> @llvm.experimental.vector.splice.v8bf16(<8 x bfloat>, <8 x bfloat>, i32)
				declare <16 x bfloat> @llvm.experimental.vector.splice.v16bf16(<16 x bfloat>, <16 x bfloat>, i32)
				declare <2 x float> @llvm.experimental.vector.splice.v2f32(<2 x float>, <2 x float>, i32)
				declare <4 x float> @llvm.experimental.vector.splice.v4f32(<4 x float>, <4 x float>, i32)
				declare <8 x float> @llvm.experimental.vector.splice.v8f32(<8 x float>, <8 x float>, i32)
				declare <16 x float> @llvm.experimental.vector.splice.v16f32(<16 x float>, <16 x float>, i32)
				declare <2 x double> @llvm.experimental.vector.splice.v2f64(<2 x double>, <2 x double>, i32)
				declare <4 x double> @llvm.experimental.vector.splice.v4f64(<4 x double>, <4 x double>, i32)

				attributes #0 = { "target-features"="+bf16" }

llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll

	; RUN: opt -cost-model -analyze -mtriple=aarch64--linux-gnu -mattr=+sve < %s \| FileCheck %s			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; RUN: opt < %s -analyze -cost-model -S -mtriple=aarch64--linux-gnu -mattr=+sve \| FileCheck %s

	define void @vector_insert_extract(<vscale x 4 x i32> %v0, <vscale x 16 x i32> %v1, <16 x i32> %v2) {			define void @vector_insert_extract(<vscale x 4 x i32> %v0, <vscale x 16 x i32> %v1, <16 x i32> %v2) {
	; CHECK-LABEL: 'vector_insert_extract'			; CHECK-LABEL: 'vector_insert_extract'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.experimental.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)			; CHECK-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.experimental.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %insert_fixed_into_scalable = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32> %v0, <16 x i32> %v2, i64 0)			; CHECK-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %insert_fixed_into_scalable = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32> %v0, <16 x i32> %v2, i64 0)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.experimental.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.experimental.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.experimental.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.experimental.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)
	%extract_fixed_from_scalable = call <16 x i32> @llvm.experimental.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)			%extract_fixed_from_scalable = call <16 x i32> @llvm.experimental.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
	▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines
	declare <vscale x 4 x float> @llvm.pow.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.pow.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>)
	declare <vscale x 4 x float> @llvm.powi.nxv4f32.i32(<vscale x 4 x float>, i32)			declare <vscale x 4 x float> @llvm.powi.nxv4f32.i32(<vscale x 4 x float>, i32)
	declare <vscale x 4 x float> @llvm.exp.nxv4f32(<vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.exp.nxv4f32(<vscale x 4 x float>)
	declare <vscale x 4 x float> @llvm.exp2.nxv4f32(<vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.exp2.nxv4f32(<vscale x 4 x float>)
	declare <vscale x 4 x float> @llvm.log.nxv4f32(<vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.log.nxv4f32(<vscale x 4 x float>)
	declare <vscale x 4 x float> @llvm.log2.nxv4f32(<vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.log2.nxv4f32(<vscale x 4 x float>)
	declare <vscale x 4 x float> @llvm.log10.nxv4f32(<vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.log10.nxv4f32(<vscale x 4 x float>)

				define void @vector_splice() #0 {
				; CHECK-LABEL: 'vector_splice'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv16i8 = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv32i8 = call <vscale x 32 x i8> @llvm.experimental.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv2i16 = call <vscale x 2 x i16> @llvm.experimental.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv4i16 = call <vscale x 4 x i16> @llvm.experimental.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv8i16 = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv16i16 = call <vscale x 16 x i16> @llvm.experimental.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv4i32 = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv8i32 = call <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv2i64 = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv4i64 = call <vscale x 4 x i64> @llvm.experimental.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv2f16 = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv4f16 = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv8f16 = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv16f16 = call <vscale x 16 x half> @llvm.experimental.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv2f32 = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv4f32 = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv8f32 = call <vscale x 8 x float> @llvm.experimental.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv2f64 = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv4f64 = call <vscale x 4 x double> @llvm.experimental.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv2bf16 = call <vscale x 2 x bfloat> @llvm.experimental.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv4bf16 = call <vscale x 4 x bfloat> @llvm.experimental.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %splice_nxv8bf16 = call <vscale x 8 x bfloat> @llvm.experimental.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %splice_nxv16bf16 = call <vscale x 16 x bfloat> @llvm.experimental.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv16i1 = call <vscale x 16 x i1> @llvm.experimental.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv8i1 = call <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv4i1 = call <vscale x 4 x i1> @llvm.experimental.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2i1 = call <vscale x 2 x i1> @llvm.experimental.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv16i8_neg = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv32i8_neg = call <vscale x 32 x i8> @llvm.experimental.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2i16_neg = call <vscale x 2 x i16> @llvm.experimental.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv4i16_neg = call <vscale x 4 x i16> @llvm.experimental.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv8i16_neg = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv16i16_neg = call <vscale x 16 x i16> @llvm.experimental.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv4i32_neg = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv8i32_neg = call <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2i64_neg = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv4i64_neg = call <vscale x 4 x i64> @llvm.experimental.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2f16_neg = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv4f16_neg = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv8f16_neg = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv16f16_neg = call <vscale x 16 x half> @llvm.experimental.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2f32_neg = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv4f32_neg = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv8f32_neg = call <vscale x 8 x float> @llvm.experimental.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2f64_neg = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv4f64_neg = call <vscale x 4 x double> @llvm.experimental.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv2bf16_neg = call <vscale x 2 x bfloat> @llvm.experimental.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv4bf16_neg = call <vscale x 4 x bfloat> @llvm.experimental.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %splice_nxv8bf16_neg = call <vscale x 8 x bfloat> @llvm.experimental.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %splice_nxv16bf16_neg = call <vscale x 16 x bfloat> @llvm.experimental.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %splice_nxv16i1_neg = call <vscale x 16 x i1> @llvm.experimental.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %splice_nxv8i1_neg = call <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %splice_nxv4i1_neg = call <vscale x 4 x i1> @llvm.experimental.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 -1)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %splice_nxv2i1_neg = call <vscale x 2 x i1> @llvm.experimental.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 -1)

				%splice_nxv16i8 = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 1)
				%splice_nxv32i8 = call <vscale x 32 x i8> @llvm.experimental.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 1)
				%splice_nxv2i16 = call <vscale x 2 x i16> @llvm.experimental.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 1)
				%splice_nxv4i16 = call <vscale x 4 x i16> @llvm.experimental.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 1)
				%splice_nxv8i16 = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 1)
				%splice_nxv16i16 = call <vscale x 16 x i16> @llvm.experimental.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 1)
				%splice_nxv4i32 = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 1)
				%splice_nxv8i32 = call <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 1)
				%splice_nxv2i64 = call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 1)
				%splice_nxv4i64 = call <vscale x 4 x i64> @llvm.experimental.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 1)
				%splice_nxv2f16 = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 1)
				%splice_nxv4f16 = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 1)
				%splice_nxv8f16 = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 1)
				%splice_nxv16f16 = call <vscale x 16 x half> @llvm.experimental.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 1)
				%splice_nxv2f32 = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 1)
				%splice_nxv4f32 = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 1)
				%splice_nxv8f32 = call <vscale x 8 x float> @llvm.experimental.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 1)
				%splice_nxv2f64 = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 1)
				%splice_nxv4f64 = call <vscale x 4 x double> @llvm.experimental.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 1)
				%splice_nxv2bf16 = call <vscale x 2 x bfloat> @llvm.experimental.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 1)
				%splice_nxv4bf16 = call <vscale x 4 x bfloat> @llvm.experimental.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 1)
				%splice_nxv8bf16 = call <vscale x 8 x bfloat> @llvm.experimental.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 1)
				%splice_nxv16bf16 = call <vscale x 16 x bfloat> @llvm.experimental.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 1)
				%splice_nxv16i1 = call <vscale x 16 x i1> @llvm.experimental.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 1)
				%splice_nxv8i1 = call <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 1)
				%splice_nxv4i1 = call <vscale x 4 x i1> @llvm.experimental.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 1)
				%splice_nxv2i1 = call <vscale x 2 x i1> @llvm.experimental.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1)
				;; negative Index
				%splice_nxv16i8_neg = call <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8> zeroinitializer, i32 -1)
				%splice_nxv32i8_neg = call <vscale x 32 x i8> @llvm.experimental.vector.splice.nxv32i8(<vscale x 32 x i8> zeroinitializer, <vscale x 32 x i8> zeroinitializer, i32 -1)
				%splice_nxv2i16_neg = call <vscale x 2 x i16> @llvm.experimental.vector.splice.nxv2i16(<vscale x 2 x i16> zeroinitializer, <vscale x 2 x i16> zeroinitializer, i32 -1)
				%splice_nxv4i16_neg = call <vscale x 4 x i16> @llvm.experimental.vector.splice.nxv4i16(<vscale x 4 x i16> zeroinitializer, <vscale x 4 x i16> zeroinitializer, i32 -1)
				%splice_nxv8i16_neg = call <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i16> zeroinitializer, i32 -1)
				%splice_nxv16i16_neg = call <vscale x 16 x i16> @llvm.experimental.vector.splice.nxv16i16(<vscale x 16 x i16> zeroinitializer, <vscale x 16 x i16> zeroinitializer, i32 -1)
				%splice_nxv4i32_neg = call <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i32> zeroinitializer, i32 -1)
				%splice_nxv8i32_neg = call <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32> zeroinitializer, <vscale x 8 x i32> zeroinitializer, i32 -1)
				%splice_nxv2i64_neg= call <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64> zeroinitializer, i32 -1)
				%splice_nxv4i64_neg = call <vscale x 4 x i64> @llvm.experimental.vector.splice.nxv4i64(<vscale x 4 x i64> zeroinitializer, <vscale x 4 x i64> zeroinitializer, i32 -1)
				%splice_nxv2f16_neg = call <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half> zeroinitializer, <vscale x 2 x half> zeroinitializer, i32 -1)
				%splice_nxv4f16_neg = call <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half> zeroinitializer, <vscale x 4 x half> zeroinitializer, i32 -1)
				%splice_nxv8f16_neg = call <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half> zeroinitializer, <vscale x 8 x half> zeroinitializer, i32 -1)
				%splice_nxv16f16_neg = call <vscale x 16 x half> @llvm.experimental.vector.splice.nxv16f16(<vscale x 16 x half> zeroinitializer, <vscale x 16 x half> zeroinitializer, i32 -1)
				%splice_nxv2f32_neg = call <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float> zeroinitializer, <vscale x 2 x float> zeroinitializer, i32 -1)
				%splice_nxv4f32_neg = call <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x float> zeroinitializer, i32 -1)
				%splice_nxv8f32_neg = call <vscale x 8 x float> @llvm.experimental.vector.splice.nxv8f32(<vscale x 8 x float> zeroinitializer, <vscale x 8 x float> zeroinitializer, i32 -1)
				%splice_nxv2f64_neg = call <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x double> zeroinitializer, i32 -1)
				%splice_nxv4f64_neg = call <vscale x 4 x double> @llvm.experimental.vector.splice.nxv4f64(<vscale x 4 x double> zeroinitializer, <vscale x 4 x double> zeroinitializer, i32 -1)
				%splice_nxv2bf16_neg = call <vscale x 2 x bfloat> @llvm.experimental.vector.splice.nxv2bf16(<vscale x 2 x bfloat> zeroinitializer, <vscale x 2 x bfloat> zeroinitializer, i32 -1)
				%splice_nxv4bf16_neg = call <vscale x 4 x bfloat> @llvm.experimental.vector.splice.nxv4bf16(<vscale x 4 x bfloat> zeroinitializer, <vscale x 4 x bfloat> zeroinitializer, i32 -1)
				%splice_nxv8bf16_neg = call <vscale x 8 x bfloat> @llvm.experimental.vector.splice.nxv8bf16(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x bfloat> zeroinitializer, i32 -1)
				%splice_nxv16bf16_neg = call <vscale x 16 x bfloat> @llvm.experimental.vector.splice.nxv16bf16(<vscale x 16 x bfloat> zeroinitializer, <vscale x 16 x bfloat> zeroinitializer, i32 -1)
				%splice_nxv16i1_neg = call <vscale x 16 x i1> @llvm.experimental.vector.splice.nxv16i1(<vscale x 16 x i1> zeroinitializer, <vscale x 16 x i1> zeroinitializer, i32 -1)
				%splice_nxv8i1_neg = call <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1> zeroinitializer, <vscale x 8 x i1> zeroinitializer, i32 -1)
				%splice_nxv4i1_neg = call <vscale x 4 x i1> @llvm.experimental.vector.splice.nxv4i1(<vscale x 4 x i1> zeroinitializer, <vscale x 4 x i1> zeroinitializer, i32 -1)
				%splice_nxv2i1_neg = call <vscale x 2 x i1> @llvm.experimental.vector.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 -1)
				ret void
				}

				declare <vscale x 2 x i1> @llvm.experimental.vector.splice.nxv2i1(<vscale x 2 x i1>, <vscale x 2 x i1>, i32)
				declare <vscale x 4 x i1> @llvm.experimental.vector.splice.nxv4i1(<vscale x 4 x i1>, <vscale x 4 x i1>, i32)
				declare <vscale x 8 x i1> @llvm.experimental.vector.splice.nxv8i1(<vscale x 8 x i1>, <vscale x 8 x i1>, i32)
				declare <vscale x 16 x i1> @llvm.experimental.vector.splice.nxv16i1(<vscale x 16 x i1>, <vscale x 16 x i1>, i32)
				declare <vscale x 2 x i8> @llvm.experimental.vector.splice.nxv2i8(<vscale x 2 x i8>, <vscale x 2 x i8>, i32)
				declare <vscale x 16 x i8> @llvm.experimental.vector.splice.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>, i32)
				declare <vscale x 32 x i8> @llvm.experimental.vector.splice.nxv32i8(<vscale x 32 x i8>, <vscale x 32 x i8>, i32)
				declare <vscale x 2 x i16> @llvm.experimental.vector.splice.nxv2i16(<vscale x 2 x i16>, <vscale x 2 x i16>, i32)
				declare <vscale x 4 x i16> @llvm.experimental.vector.splice.nxv4i16(<vscale x 4 x i16>, <vscale x 4 x i16>, i32)
				declare <vscale x 8 x i16> @llvm.experimental.vector.splice.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>, i32)
				declare <vscale x 16 x i16> @llvm.experimental.vector.splice.nxv16i16(<vscale x 16 x i16>, <vscale x 16 x i16>, i32)
				declare <vscale x 4 x i32> @llvm.experimental.vector.splice.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>, i32)
				declare <vscale x 8 x i32> @llvm.experimental.vector.splice.nxv8i32(<vscale x 8 x i32>, <vscale x 8 x i32>, i32)
				declare <vscale x 2 x i64> @llvm.experimental.vector.splice.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>, i32)
				declare <vscale x 4 x i64> @llvm.experimental.vector.splice.nxv4i64(<vscale x 4 x i64>, <vscale x 4 x i64>, i32)
				declare <vscale x 2 x half> @llvm.experimental.vector.splice.nxv2f16(<vscale x 2 x half>, <vscale x 2 x half>, i32)
				declare <vscale x 4 x half> @llvm.experimental.vector.splice.nxv4f16(<vscale x 4 x half>, <vscale x 4 x half>, i32)
				declare <vscale x 8 x half> @llvm.experimental.vector.splice.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, i32)
				declare <vscale x 16 x half> @llvm.experimental.vector.splice.nxv16f16(<vscale x 16 x half>, <vscale x 16 x half>, i32)
				declare <vscale x 2 x bfloat> @llvm.experimental.vector.splice.nxv2bf16(<vscale x 2 x bfloat>, <vscale x 2 x bfloat>, i32)
				declare <vscale x 4 x bfloat> @llvm.experimental.vector.splice.nxv4bf16(<vscale x 4 x bfloat>, <vscale x 4 x bfloat>, i32)
				declare <vscale x 8 x bfloat> @llvm.experimental.vector.splice.nxv8bf16(<vscale x 8 x bfloat>, <vscale x 8 x bfloat>, i32)
				declare <vscale x 16 x bfloat> @llvm.experimental.vector.splice.nxv16bf16(<vscale x 16 x bfloat>, <vscale x 16 x bfloat>, i32)
				declare <vscale x 2 x float> @llvm.experimental.vector.splice.nxv2f32(<vscale x 2 x float>, <vscale x 2 x float>, i32)
				declare <vscale x 4 x float> @llvm.experimental.vector.splice.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>, i32)
				declare <vscale x 8 x float> @llvm.experimental.vector.splice.nxv8f32(<vscale x 8 x float>, <vscale x 8 x float>, i32)
				declare <vscale x 16 x float> @llvm.experimental.vector.splice.nxv16f32(<vscale x 16 x float>, <vscale x 16 x float>, i32)
				declare <vscale x 2 x double> @llvm.experimental.vector.splice.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>, i32)
				declare <vscale x 4 x double> @llvm.experimental.vector.splice.nxv4f64(<vscale x 4 x double>, <vscale x 4 x double>, i32)

	attributes #0 = { "target-features"="+sve,+bf16" }			attributes #0 = { "target-features"="+sve,+bf16" }