This is an archive of the discontinued LLVM Phabricator instance.

[Analysis][AArch64] Add on overhead costs for fixed-width gathers and scatters
AbandonedPublic

Authored by david-arm on Dec 6 2021, 4:56 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
CarolineConcatto
RosieSumpter
dmgreen

Summary

This patches increases the cost of fixed-width gathers and scatters that
will end up scalarising anyway. When SVE is enabled the vectoriser will currently
still often favour fixed-width vectorisation for AArch64 with gathers and
scatters even when the scalar loop is actually faster.

Diff Detail

Event Timeline

david-arm created this revision.Dec 6 2021, 4:56 AM

Herald added subscribers: ctetreau, hiraditya, kristof.beyls. · View Herald TranscriptDec 6 2021, 4:56 AM

david-arm requested review of this revision.Dec 6 2021, 4:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 6 2021, 4:56 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm added a parent revision: D115143: [AArch64][Analysis] Add on overhead costs for SVE gathers and scatters.Dec 6 2021, 4:56 AM

Harbormaster completed remote builds in B137631: Diff 392025.Dec 6 2021, 6:05 AM

Rebase.

This patches increases the cost of fixed-width gathers and scatters that will end up scalarising anyway.

Do you know if isLegalMaskedGather/Scatter is returning true for these gathers that will jut be scalarized? It can be difficult at times to make isLegalMaskedGather very precise.

In D115145#3175745, @dmgreen wrote:

This patches increases the cost of fixed-width gathers and scatters that will end up scalarising anyway.

Do you know if isLegalMaskedGather/Scatter is returning true for these gathers that will jut be scalarized? It can be difficult at times to make isLegalMaskedGather very precise.

Yeah the problem is that currently the vectoriser only passes the element type to isLegalMaskedGather/Scatter and isLegalMaskedLoad/Store. So we have no way to distinguish between fixed-width and scalable except via the cost model. I think we would also like to fix that at some point, but in the short term this simple cost model change helps.

Harbormaster completed remote builds in B137840: Diff 392303.Dec 7 2021, 1:53 AM

david-arm abandoned this revision.Dec 16 2021, 7:41 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.cpp

21 lines

test/

Analysis/

CostModel/

AArch64/

mem-op-cost-model.ll

48 lines

Diff 392303

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,757 Lines • ▼ Show 20 Lines	AArch64TTIImpl::getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
// it. This change will be removed when code-generation for these types is		// it. This change will be removed when code-generation for these types is
// sufficiently reliable.		// sufficiently reliable.
if (cast<VectorType>(Src)->getElementCount() == ElementCount::getScalable(1))		if (cast<VectorType>(Src)->getElementCount() == ElementCount::getScalable(1))
return InstructionCost::getInvalid();		return InstructionCost::getInvalid();

return LT.first * 2;		return LT.first * 2;
}		}

		static unsigned getGatherScatterOverhead(unsigned Opcode, bool UseSVE) {
		// TODO: At the moment the SVE cost is applied unilaterally for all CPUs, but
		// at some point we may want a per-CPU overhead.
		if (Opcode == Instruction::Store)
		return UseSVE ? 10 : 4;
		else
		return UseSVE ? 10 : 2;
		}

InstructionCost AArch64TTIImpl::getGatherScatterOpCost(		InstructionCost AArch64TTIImpl::getGatherScatterOpCost(
unsigned Opcode, Type DataTy, const Value Ptr, bool VariableMask,		unsigned Opcode, Type DataTy, const Value Ptr, bool VariableMask,
Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I) {		Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I) {
if (useNeonVector(DataTy))		if (useNeonVector(DataTy)) {
return BaseT::getGatherScatterOpCost(Opcode, DataTy, Ptr, VariableMask,		InstructionCost Cost = BaseT::getGatherScatterOpCost(
Alignment, CostKind, I);		Opcode, DataTy, Ptr, VariableMask, Alignment, CostKind, I);
		return Cost * getGatherScatterOverhead(Opcode, false);
		}
auto *VT = cast<VectorType>(DataTy);		auto *VT = cast<VectorType>(DataTy);
auto LT = TLI->getTypeLegalizationCost(DL, DataTy);		auto LT = TLI->getTypeLegalizationCost(DL, DataTy);
if (!LT.first.isValid())		if (!LT.first.isValid())
return InstructionCost::getInvalid();		return InstructionCost::getInvalid();

// The code-generator is currently not able to handle scalable vectors		// The code-generator is currently not able to handle scalable vectors
// of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting		// of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
// it. This change will be removed when code-generation for these types is		// it. This change will be removed when code-generation for these types is
// sufficiently reliable.		// sufficiently reliable.
if (cast<VectorType>(DataTy)->getElementCount() ==		if (cast<VectorType>(DataTy)->getElementCount() ==
ElementCount::getScalable(1))		ElementCount::getScalable(1))
return InstructionCost::getInvalid();		return InstructionCost::getInvalid();

ElementCount LegalVF = LT.second.getVectorElementCount();		ElementCount LegalVF = LT.second.getVectorElementCount();
InstructionCost MemOpCost =		InstructionCost MemOpCost =
getMemoryOpCost(Opcode, VT->getElementType(), Alignment, 0, CostKind, I);		getMemoryOpCost(Opcode, VT->getElementType(), Alignment, 0, CostKind, I);
// Add on an overhead cost for using gathers/scatters.		// Add on an overhead cost for using gathers/scatters.
// TODO: At the moment this is applied unilaterally for all CPUs, but at some		MemOpCost *= getGatherScatterOverhead(Opcode, true);
// point we may want a per-CPU overhead.
MemOpCost *= 10;
return LT.first * MemOpCost * getMaxNumElements(LegalVF);		return LT.first * MemOpCost * getMaxNumElements(LegalVF);
}		}

bool AArch64TTIImpl::useNeonVector(const Type *Ty) const {		bool AArch64TTIImpl::useNeonVector(const Type *Ty) const {
return isa<FixedVectorType>(Ty) && !ST->useSVEForFixedLengthVectors();		return isa<FixedVectorType>(Ty) && !ST->useSVEForFixedLengthVectors();
}		}

InstructionCost AArch64TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Ty,		InstructionCost AArch64TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Ty,
▲ Show 20 Lines • Show All 658 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/mem-op-cost-model.ll

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 1 for instruction:			; CHECK-SVE-512: Cost Model: Found an estimated cost of 1 for instruction:
	%out = load <8 x i64>, <8 x i64>* %ptr			%out = load <8 x i64>, <8 x i64>* %ptr
	ret <8 x i64> %out			ret <8 x i64> %out
	}			}

	declare <4 x i8> @llvm.masked.gather.v4i8.v4p0i8(<4 x i8*>, i32 immarg, <4 x i1>, <4 x i8>)			declare <4 x i8> @llvm.masked.gather.v4i8.v4p0i8(<4 x i8*>, i32 immarg, <4 x i1>, <4 x i8>)
	define <4 x i8> @gather_load_4xi8_constant_mask(<4 x i8*> %ptrs) {			define <4 x i8> @gather_load_4xi8_constant_mask(<4 x i8*> %ptrs) {
	; CHECK: gather_load_4xi8_constant_mask			; CHECK: gather_load_4xi8_constant_mask
	; CHECK-NEON: Cost Model: Found an estimated cost of 17 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8			; CHECK-NEON: Cost Model: Found an estimated cost of 34 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8
	; CHECK-SVE-128: Cost Model: Found an estimated cost of 17 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8			; CHECK-SVE-128: Cost Model: Found an estimated cost of 34 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8
	; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8			; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8			; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8
	;			;
	%lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8(<4 x i8*> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)			%lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8(<4 x i8*> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)
	ret <4 x i8> %lv			ret <4 x i8> %lv
	}			}

	define <4 x i8> @gather_load_4xi8_variable_mask(<4 x i8*> %ptrs, <4 x i1> %cond) {			define <4 x i8> @gather_load_4xi8_variable_mask(<4 x i8*> %ptrs, <4 x i1> %cond) {
	; CHECK: gather_load_4xi8_variable_mask			; CHECK: gather_load_4xi8_variable_mask
	; CHECK-NEON: Cost Model: Found an estimated cost of 29 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8			; CHECK-NEON: Cost Model: Found an estimated cost of 58 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8
	; CHECK-SVE-128: Cost Model: Found an estimated cost of 29 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8			; CHECK-SVE-128: Cost Model: Found an estimated cost of 58 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8
	; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8			; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8			; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8
	;			;
	%lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8(<4 x i8*> %ptrs, i32 1, <4 x i1> %cond, <4 x i8> undef)			%lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0i8(<4 x i8*> %ptrs, i32 1, <4 x i1> %cond, <4 x i8> undef)
	ret <4 x i8> %lv			ret <4 x i8> %lv
	}			}

	declare void @llvm.masked.scatter.v4i8.v4p0i8(<4 x i8>, <4 x i8*>, i32 immarg, <4 x i1>)			declare void @llvm.masked.scatter.v4i8.v4p0i8(<4 x i8>, <4 x i8*>, i32 immarg, <4 x i1>)
	define void @scatter_store_4xi8_constant_mask(<4 x i8> %val, <4 x i8*> %ptrs) {			define void @scatter_store_4xi8_constant_mask(<4 x i8> %val, <4 x i8*> %ptrs) {
	; CHECK: scatter_store_4xi8_constant_mask			; CHECK: scatter_store_4xi8_constant_mask
	; CHECK-NEON: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(			; CHECK-NEON: Cost Model: Found an estimated cost of 68 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(
	; CHECK-SVE-128: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(			; CHECK-SVE-128: Cost Model: Found an estimated cost of 68 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(
	; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(			; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(			; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(
	;			;
	call void @llvm.masked.scatter.v4i8.v4p0i8(<4 x i8> %val, <4 x i8*> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)			call void @llvm.masked.scatter.v4i8.v4p0i8(<4 x i8> %val, <4 x i8*> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
	ret void			ret void
	}			}

	define void @scatter_store_4xi8_variable_mask(<4 x i8> %val, <4 x i8*> %ptrs, <4 x i1> %cond) {			define void @scatter_store_4xi8_variable_mask(<4 x i8> %val, <4 x i8*> %ptrs, <4 x i1> %cond) {
	; CHECK: scatter_store_4xi8_variable_mask			; CHECK: scatter_store_4xi8_variable_mask
	; CHECK-NEON: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(			; CHECK-NEON: Cost Model: Found an estimated cost of 116 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(
	; CHECK-SVE-128: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(			; CHECK-SVE-128: Cost Model: Found an estimated cost of 116 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(
	; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(			; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(			; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0i8(
	;			;
	call void @llvm.masked.scatter.v4i8.v4p0i8(<4 x i8> %val, <4 x i8*> %ptrs, i32 1, <4 x i1> %cond)			call void @llvm.masked.scatter.v4i8.v4p0i8(<4 x i8> %val, <4 x i8*> %ptrs, i32 1, <4 x i1> %cond)
	ret void			ret void
	}			}

	declare <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*>, i32 immarg, <4 x i1>, <4 x i32>)			declare <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*>, i32 immarg, <4 x i1>, <4 x i32>)
	define <4 x i32> @gather_load_4xi32_constant_mask(<4 x i32*> %ptrs) {			define <4 x i32> @gather_load_4xi32_constant_mask(<4 x i32*> %ptrs) {
	; CHECK: gather_load_4xi32_constant_mask			; CHECK: gather_load_4xi32_constant_mask
	; CHECK-NEON: Cost Model: Found an estimated cost of 17 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32			; CHECK-NEON: Cost Model: Found an estimated cost of 34 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32
	; CHECK-SVE-128: Cost Model: Found an estimated cost of 17 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32			; CHECK-SVE-128: Cost Model: Found an estimated cost of 34 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32
	; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32			; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32			; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32
	;			;
	%lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)			%lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)
	ret <4 x i32> %lv			ret <4 x i32> %lv
	}			}

	define <4 x i32> @gather_load_4xi32_variable_mask(<4 x i32*> %ptrs, <4 x i1> %cond) {			define <4 x i32> @gather_load_4xi32_variable_mask(<4 x i32*> %ptrs, <4 x i1> %cond) {
	; CHECK: gather_load_4xi32_variable_mask			; CHECK: gather_load_4xi32_variable_mask
	; CHECK-NEON: Cost Model: Found an estimated cost of 29 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32			; CHECK-NEON: Cost Model: Found an estimated cost of 58 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32
	; CHECK-SVE-128: Cost Model: Found an estimated cost of 29 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32			; CHECK-SVE-128: Cost Model: Found an estimated cost of 58 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32
	; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32			; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32			; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32
	;			;
	%lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %ptrs, i32 1, <4 x i1> %cond, <4 x i32> undef)			%lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> %ptrs, i32 1, <4 x i1> %cond, <4 x i32> undef)
	ret <4 x i32> %lv			ret <4 x i32> %lv
	}			}

	declare void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32>, <4 x i32*>, i32 immarg, <4 x i1>)			declare void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32>, <4 x i32*>, i32 immarg, <4 x i1>)
	define void @scatter_store_4xi32_constant_mask(<4 x i32> %val, <4 x i32*> %ptrs) {			define void @scatter_store_4xi32_constant_mask(<4 x i32> %val, <4 x i32*> %ptrs) {
	; CHECK: scatter_store_4xi32_constant_mask			; CHECK: scatter_store_4xi32_constant_mask
	; CHECK-NEON: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(			; CHECK-NEON: Cost Model: Found an estimated cost of 68 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(
	; CHECK-SVE-128: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(			; CHECK-SVE-128: Cost Model: Found an estimated cost of 68 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(
	; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(			; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(			; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(
	;			;
	call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %val, <4 x i32*> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)			call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %val, <4 x i32*> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
	ret void			ret void
	}			}

	define void @scatter_store_4xi32_variable_mask(<4 x i32> %val, <4 x i32*> %ptrs, <4 x i1> %cond) {			define void @scatter_store_4xi32_variable_mask(<4 x i32> %val, <4 x i32*> %ptrs, <4 x i1> %cond) {
	; CHECK: scatter_store_4xi32_variable_mask			; CHECK: scatter_store_4xi32_variable_mask
	; CHECK-NEON: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(			; CHECK-NEON: Cost Model: Found an estimated cost of 116 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(
	; CHECK-SVE-128: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(			; CHECK-SVE-128: Cost Model: Found an estimated cost of 116 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(
	; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(			; CHECK-SVE-256: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(			; CHECK-SVE-512: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32(
	;			;
	call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %val, <4 x i32*> %ptrs, i32 1, <4 x i1> %cond)			call void @llvm.masked.scatter.v4i32.v4p0i32(<4 x i32> %val, <4 x i32*> %ptrs, i32 1, <4 x i1> %cond)
	ret void			ret void
	}			}

	declare <256 x i16> @llvm.masked.gather.v256i16.v256p0i16(<256 x i16*>, i32, <256 x i1>, <256 x i16>)			declare <256 x i16> @llvm.masked.gather.v256i16.v256p0i16(<256 x i16*>, i32, <256 x i1>, <256 x i16>)
	define void @sve_gather_vls(<256 x i1> %v256i1mask) {			define void @sve_gather_vls(<256 x i1> %v256i1mask) {
	; CHECK-LABEL: 'sve_scatter_vls'			; CHECK-LABEL: 'sve_scatter_vls'
	; CHECK-NEON: Cost Model: Found an estimated cost of 1952 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0i16(<256 x i16*> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)			; CHECK-NEON: Cost Model: Found an estimated cost of 3904 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0i16(<256 x i16*> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)
	; CHECK-SVE-128: Cost Model: Found an estimated cost of 1952 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0i16(<256 x i16*> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)			; CHECK-SVE-128: Cost Model: Found an estimated cost of 3904 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0i16(<256 x i16*> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)
	; CHECK-SVE-256: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0i16(<256 x i16*> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)			; CHECK-SVE-256: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0i16(<256 x i16*> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0i16(<256 x i16*> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)			; CHECK-SVE-512: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0i16(<256 x i16*> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)
	entry:			entry:
	%res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0i16(<256 x i16*> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)			%res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0i16(<256 x i16*> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)
	ret void			ret void
	}			}

	declare <256 x float> @llvm.masked.gather.v256f32.v256p0f32(<256 x float*>, i32, <256 x i1>, <256 x float>)			declare <256 x float> @llvm.masked.gather.v256f32.v256p0f32(<256 x float*>, i32, <256 x i1>, <256 x float>)
	define void @sve_gather_vls_float(<256 x i1> %v256i1mask) {			define void @sve_gather_vls_float(<256 x i1> %v256i1mask) {
	; CHECK-LABEL: 'sve_gather_vls_float'			; CHECK-LABEL: 'sve_gather_vls_float'
	; CHECK-NEON: Cost Model: Found an estimated cost of 1856 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0f32(<256 x float*> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)			; CHECK-NEON: Cost Model: Found an estimated cost of 3712 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0f32(<256 x float*> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)
	; CHECK-SVE-128: Cost Model: Found an estimated cost of 1856 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0f32(<256 x float*> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)			; CHECK-SVE-128: Cost Model: Found an estimated cost of 3712 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0f32(<256 x float*> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)
	; CHECK-SVE-256: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0f32(<256 x float*> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)			; CHECK-SVE-256: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0f32(<256 x float*> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0f32(<256 x float*> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)			; CHECK-SVE-512: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0f32(<256 x float*> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)
	entry:			entry:
	%res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0f32(<256 x float*> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)			%res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0f32(<256 x float*> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)
	ret void			ret void
	}			}

	declare void @llvm.masked.scatter.v256i8.v256p0i8(<256 x i8>, <256 x i8*>, i32, <256 x i1>)			declare void @llvm.masked.scatter.v256i8.v256p0i8(<256 x i8>, <256 x i8*>, i32, <256 x i1>)
	define void @sve_scatter_vls(<256 x i1> %v256i1mask){			define void @sve_scatter_vls(<256 x i1> %v256i1mask){
	; CHECK-LABEL: 'sve_scatter_vls'			; CHECK-LABEL: 'sve_scatter_vls'
	; CHECK-NEON: Cost Model: Found an estimated cost of 2000 for instruction: call void @llvm.masked.scatter.v256i8.v256p0i8(<256 x i8> undef, <256 x i8*> undef, i32 0, <256 x i1> %v256i1mask)			; CHECK-NEON: Cost Model: Found an estimated cost of 8000 for instruction: call void @llvm.masked.scatter.v256i8.v256p0i8(<256 x i8> undef, <256 x i8*> undef, i32 0, <256 x i1> %v256i1mask)
	; CHECK-SVE-128: Cost Model: Found an estimated cost of 2000 for instruction: call void @llvm.masked.scatter.v256i8.v256p0i8(<256 x i8> undef, <256 x i8*> undef, i32 0, <256 x i1> %v256i1mask)			; CHECK-SVE-128: Cost Model: Found an estimated cost of 8000 for instruction: call void @llvm.masked.scatter.v256i8.v256p0i8(<256 x i8> undef, <256 x i8*> undef, i32 0, <256 x i1> %v256i1mask)
	; CHECK-SVE-256: Cost Model: Found an estimated cost of 2560 for instruction: call void @llvm.masked.scatter.v256i8.v256p0i8(<256 x i8> undef, <256 x i8*> undef, i32 0, <256 x i1> %v256i1mask)			; CHECK-SVE-256: Cost Model: Found an estimated cost of 2560 for instruction: call void @llvm.masked.scatter.v256i8.v256p0i8(<256 x i8> undef, <256 x i8*> undef, i32 0, <256 x i1> %v256i1mask)
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 2560 for instruction: call void @llvm.masked.scatter.v256i8.v256p0i8(<256 x i8> undef, <256 x i8*> undef, i32 0, <256 x i1> %v256i1mask)			; CHECK-SVE-512: Cost Model: Found an estimated cost of 2560 for instruction: call void @llvm.masked.scatter.v256i8.v256p0i8(<256 x i8> undef, <256 x i8*> undef, i32 0, <256 x i1> %v256i1mask)
	entry:			entry:
	call void @llvm.masked.scatter.v256i8.v256p0i8(<256 x i8> undef, <256 x i8*> undef, i32 0, <256 x i1> %v256i1mask)			call void @llvm.masked.scatter.v256i8.v256p0i8(<256 x i8> undef, <256 x i8*> undef, i32 0, <256 x i1> %v256i1mask)
	ret void			ret void
	}			}

	declare void @llvm.masked.scatter.v512f16.v512p0f16(<512 x half>, <512 x half*>, i32, <512 x i1>)			declare void @llvm.masked.scatter.v512f16.v512p0f16(<512 x half>, <512 x half*>, i32, <512 x i1>)
	define void @sve_scatter_vls_float(<512 x i1> %v512i1mask){			define void @sve_scatter_vls_float(<512 x i1> %v512i1mask){
	; CHECK-LABEL: 'sve_scatter_vls_float'			; CHECK-LABEL: 'sve_scatter_vls_float'
	; CHECK-NEON: Cost Model: Found an estimated cost of 3904 for instruction: call void @llvm.masked.scatter.v512f16.v512p0f16(<512 x half> undef, <512 x half*> undef, i32 0, <512 x i1> %v512i1mask)			; CHECK-NEON: Cost Model: Found an estimated cost of 15616 for instruction: call void @llvm.masked.scatter.v512f16.v512p0f16(<512 x half> undef, <512 x half*> undef, i32 0, <512 x i1> %v512i1mask)
	; CHECK-SVE-128: Cost Model: Found an estimated cost of 3904 for instruction: call void @llvm.masked.scatter.v512f16.v512p0f16(<512 x half> undef, <512 x half*> undef, i32 0, <512 x i1> %v512i1mask)			; CHECK-SVE-128: Cost Model: Found an estimated cost of 15616 for instruction: call void @llvm.masked.scatter.v512f16.v512p0f16(<512 x half> undef, <512 x half*> undef, i32 0, <512 x i1> %v512i1mask)
	; CHECK-SVE-256: Cost Model: Found an estimated cost of 5120 for instruction: call void @llvm.masked.scatter.v512f16.v512p0f16(<512 x half> undef, <512 x half*> undef, i32 0, <512 x i1> %v512i1mask)			; CHECK-SVE-256: Cost Model: Found an estimated cost of 5120 for instruction: call void @llvm.masked.scatter.v512f16.v512p0f16(<512 x half> undef, <512 x half*> undef, i32 0, <512 x i1> %v512i1mask)
	; CHECK-SVE-512: Cost Model: Found an estimated cost of 5120 for instruction: call void @llvm.masked.scatter.v512f16.v512p0f16(<512 x half> undef, <512 x half*> undef, i32 0, <512 x i1> %v512i1mask)			; CHECK-SVE-512: Cost Model: Found an estimated cost of 5120 for instruction: call void @llvm.masked.scatter.v512f16.v512p0f16(<512 x half> undef, <512 x half*> undef, i32 0, <512 x i1> %v512i1mask)
	call void @llvm.masked.scatter.v512f16.v512p0f16(<512 x half> undef, <512 x half*> undef, i32 0, <512 x i1> %v512i1mask)			call void @llvm.masked.scatter.v512f16.v512p0f16(<512 x half> undef, <512 x half*> undef, i32 0, <512 x i1> %v512i1mask)
	ret void			ret void
	}			}