Diff 374241

llvm/lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	protected:

// CustomCallUsedXRegister[i] - X#i call saved.		// CustomCallUsedXRegister[i] - X#i call saved.
BitVector CustomCallSavedXRegs;		BitVector CustomCallSavedXRegs;

bool IsLittle;		bool IsLittle;

unsigned MinSVEVectorSizeInBits;		unsigned MinSVEVectorSizeInBits;
unsigned MaxSVEVectorSizeInBits;		unsigned MaxSVEVectorSizeInBits;

		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: This may be personal preference, but would `VScaleForCPU` be a more suitable name? sdesmalen: nit: This may be personal preference, but would `VScaleForCPU` be a more suitable name?
		david-armAuthorUnsubmitted Done Reply Inline Actions I don't mind really. I don't have a strong preference. I guess the reason I used VScaleForTuning was just to make it clear that this is not an indication of what the exact vscale will be at runtime, since it could be anything from 1 -> max vscale for the CPU. Rather it's just an indication of how we'd like to tune the vectoriser, optimisations and codegen. However, happy to change it if you prefer it! david-arm: I don't mind really. I don't have a strong preference. I guess the reason I used…
		sdesmalenUnsubmitted Not Done Reply Inline Actions Okay fair enough, maybe 'VScaleForTuning' is more appropriate. sdesmalen: Okay fair enough, maybe 'VScaleForTuning' is more appropriate.
/// TargetTriple - What processor and OS we're targeting.		/// TargetTriple - What processor and OS we're targeting.
Triple TargetTriple;		Triple TargetTriple;

AArch64FrameLowering FrameLowering;		AArch64FrameLowering FrameLowering;
AArch64InstrInfo InstrInfo;		AArch64InstrInfo InstrInfo;
AArch64SelectionDAGInfo TSInfo;		AArch64SelectionDAGInfo TSInfo;
AArch64TargetLowering TLInfo;		AArch64TargetLowering TLInfo;

▲ Show 20 Lines • Show All 352 Lines • ▼ Show 20 Lines	public:
}		}

unsigned getMinSVEVectorSizeInBits() const {		unsigned getMinSVEVectorSizeInBits() const {
assert(HasSVE && "Tried to get SVE vector length without SVE support!");		assert(HasSVE && "Tried to get SVE vector length without SVE support!");
return MinSVEVectorSizeInBits;		return MinSVEVectorSizeInBits;
}		}

bool useSVEForFixedLengthVectors() const;		bool useSVEForFixedLengthVectors() const;

		/// Returns the value of vscale we should use for tuning the cost model
		/// using attributes found in Function \p F. If \p F is nullptr then return
		/// a default of 16.
		unsigned getVScaleForTuning(const Function *F) const;
};		};
} // End llvm namespace		} // End llvm namespace

#endif		#endif

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	case Kryo:
MaxPrefetchIterationsAhead = 11;		MaxPrefetchIterationsAhead = 11;
// FIXME: remove this to enable 64-bit SLP if performance looks good.		// FIXME: remove this to enable 64-bit SLP if performance looks good.
MinVectorRegisterBitWidth = 128;		MinVectorRegisterBitWidth = 128;
break;		break;
case NeoverseE1:		case NeoverseE1:
PrefFunctionLogAlignment = 3;		PrefFunctionLogAlignment = 3;
break;		break;
case NeoverseN1:		case NeoverseN1:
case NeoverseN2:		case NeoverseN2:
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: seems unrelated? sdesmalen: nit: seems unrelated?
		david-armAuthorUnsubmitted Done Reply Inline Actions I had to split this out as a separate case to avoid the fallthrough into the NeoverseN2 case that's all. david-arm: I had to split this out as a separate case to avoid the fallthrough into the NeoverseN2 case…
case NeoverseV1:		case NeoverseV1:
PrefFunctionLogAlignment = 4;		PrefFunctionLogAlignment = 4;
break;		break;
case Saphira:		case Saphira:
MaxInterleaveFactor = 4;		MaxInterleaveFactor = 4;
// FIXME: remove this to enable 64-bit SLP if performance looks good.		// FIXME: remove this to enable 64-bit SLP if performance looks good.
MinVectorRegisterBitWidth = 128;		MinVectorRegisterBitWidth = 128;
break;		break;
Show All 31 Lines	case ThunderX3T110:
PrefetchDistance = 128;		PrefetchDistance = 128;
MinPrefetchStride = 1024;		MinPrefetchStride = 1024;
MaxPrefetchIterationsAhead = 4;		MaxPrefetchIterationsAhead = 4;
// FIXME: remove this to enable 64-bit SLP if performance looks good.		// FIXME: remove this to enable 64-bit SLP if performance looks good.
MinVectorRegisterBitWidth = 128;		MinVectorRegisterBitWidth = 128;
break;		break;
}		}
}		}

		dmgreenUnsubmitted Done Reply Inline Actions I don't think these should be needed, as the above ARMProcFamily should be based on TuneCPU after D110258/D111551. dmgreen: I don't think these should be needed, as the above ARMProcFamily should be based on TuneCPU…
		david-armAuthorUnsubmitted Done Reply Inline Actions Hi @dmgreen, I was quite worried that D111551 will hold up this patch and I don't really see why it has to, therefore I have kept them independent. I was thinking that once D111551 lands then I can fix up this code and use ARMProcFamily instead. We'd really like to get the cost model changes in asap because it's preventing vectorisation on machines with SVE hardware. I feel that D111551 is a nice thing to have, but not a requirement for the cost model. I also expect it will take a lot longer to get approved. :) david-arm: Hi @dmgreen, I was quite worried that D111551 will hold up this patch and I don't really see…
		dmgreenUnsubmitted Not Done Reply Inline Actions I'm not sure I see why we need to rush with a suboptimal solution. SVE autovec was not enabled, and this patch just enabled some extra gather/scatters and reductions. D111551 doesn't look like it should take too long either, but maybe I'm being optimistic there. If you want it to land sooner, you could just remove all the references to tune-cpu in this patch. That way it works with -mcpu, the same as every other tuning feature has in llvm for the last 10 year :) It should then automatically become a tuning features when those other two patches are done. dmgreen: I'm not sure I see why we need to rush with a suboptimal solution. SVE autovec was not enabled…
AArch64Subtarget::AArch64Subtarget(const Triple &TT, const std::string &CPU,		AArch64Subtarget::AArch64Subtarget(const Triple &TT, const std::string &CPU,
const std::string &FS,		const std::string &FS,
const TargetMachine &TM, bool LittleEndian,		const TargetMachine &TM, bool LittleEndian,
unsigned MinSVEVectorSizeInBitsOverride,		unsigned MinSVEVectorSizeInBitsOverride,
unsigned MaxSVEVectorSizeInBitsOverride)		unsigned MaxSVEVectorSizeInBitsOverride)
: AArch64GenSubtargetInfo(TT, CPU, /TuneCPU/ CPU, FS),		: AArch64GenSubtargetInfo(TT, CPU, /TuneCPU/ CPU, FS),
ReserveXRegister(AArch64::GPR64commonRegClass.getNumRegs()),		ReserveXRegister(AArch64::GPR64commonRegClass.getNumRegs()),
CustomCallSavedXRegs(AArch64::GPR64commonRegClass.getNumRegs()),		CustomCallSavedXRegs(AArch64::GPR64commonRegClass.getNumRegs()),
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines
}		}

bool AArch64Subtarget::useSVEForFixedLengthVectors() const {		bool AArch64Subtarget::useSVEForFixedLengthVectors() const {
// Prefer NEON unless larger SVE registers are available.		// Prefer NEON unless larger SVE registers are available.
return hasSVE() && getMinSVEVectorSizeInBits() >= 256;		return hasSVE() && getMinSVEVectorSizeInBits() >= 256;
}		}

bool AArch64Subtarget::useAA() const { return UseAA; }		bool AArch64Subtarget::useAA() const { return UseAA; }

		unsigned AArch64Subtarget::getVScaleForTuning(const Function *F) const {
		unsigned MaxVScale = 16;
		// Bail out if we don't have a Function
		sdesmalenUnsubmitted Done Reply Inline Actions nit: redundant comment, the code says the same thing, just shorter :) sdesmalen: nit: redundant comment, the code says the same thing, just shorter :)
		if (!F)
		return MaxVScale;

		Attribute CPUAttr = F->getFnAttribute("target-cpu");
		Attribute TuneAttr = F->getFnAttribute("tune-cpu");

		std::string CPU =
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - std::string CPU = - CPUAttr.isValid() ? CPUAttr.getValueAsString().str() : ""; + std::string CPU = CPUAttr.isValid() ? CPUAttr.getValueAsString().str() : ""; Lint: Pre-merge checks: clang-format: please reformat the code ``` - std::string CPU = - CPUAttr.isValid() ?
		CPUAttr.isValid() ? CPUAttr.getValueAsString().str() : "";
		std::string TuneCPU =
		TuneAttr.isValid() ? TuneAttr.getValueAsString().str() : CPU;

		if (TuneCPU == "generic") return 4;
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - if (TuneCPU == "generic") return 4; - else if (TuneCPU == "neoverse-v1") return 2; - else if (TuneCPU == "neoverse-n2") return 1; + if (TuneCPU == "generic") + return 4; + else if (TuneCPU == "neoverse-v1") + return 2; + else if (TuneCPU == "neoverse-n2") + return 1; Lint: Pre-merge checks: clang-format: please reformat the code ``` - if (TuneCPU == "generic") return 4; - else if…
		else if (TuneCPU == "neoverse-v1") return 2;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: do not use 'else' after 'return' [llvm-else-after-return] not useful Lint: Pre-merge checks: clang-tidy: warning: do not use 'else' after 'return' [llvm-else-after-return] [[https://github.
		else if (TuneCPU == "neoverse-n2") return 1;
		dmgreenUnsubmitted Not Done Reply Inline Actions These per-cpu values should probably be subtarget features or defined in AArch64Subtarget::initializeProperties like the other alignments and whatnot. dmgreen: These per-cpu values should probably be subtarget features or defined in AArch64Subtarget…
		david-armAuthorUnsubmitted Done Reply Inline Actions Hi @dmgreen, I think that's perfectly sensible when you only have a target-cpu feature. I originally tried doing it that way, but realised if target-cpu=generic and tune-cpu=neoverse-n1 then the subtarget won't have max vscale set, since we chose a generic subtarget. david-arm: Hi @dmgreen, I think that's perfectly sensible when you only have a target-cpu feature. I…
		dmgreenUnsubmitted Done Reply Inline Actions Tune-cpu has never worked in the AArch64 backend, as far as I understand. Like I said in the other patch the subtarget features do not distinguish between performance features and architecture features. Having one feature work off tune-cpu whereas everything else in the backend works on target-cpu sounds wrong to me, without fixing it properly and making -mtune work as you would expect across the board. Which would be great but until then I would base everything off the cpu for consistency. If you do want to make tune-cpu work, look at how the X86 backend does it. This shouldn't have to look into the Functions target-cpu and tune-cpu attributes, it should be initialized with the correct CPU and TuneCPU's passed in to the subtarget. dmgreen: Tune-cpu has never worked in the AArch64 backend, as far as I understand. Like I said in the…

		// We couldn't find a CPU to use for tuning so let's fall back on the
		// vscale_range attribute instead.

		if (F && F->hasFnAttribute(Attribute::VScaleRange)) {
		sdesmalenUnsubmitted Done Reply Inline Actions F is already known to be != nullptr. For consistency, can you do the same as above here, get the attribute and then check whether it's valid (as opposed to checking whether the function has the attribute. sdesmalen: 1. F is already known to be != nullptr. 2. For consistency, can you do the same as above here…
		unsigned Arg =
		F->getFnAttribute(Attribute::VScaleRange).getVScaleRangeArgs().second;
		if (Arg > 0)
		MaxVScale = Arg;
		}

		return MaxVScale;
		}

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	public:
/// when scalarizing an operation for a vector with ElementCount \p VF.		/// when scalarizing an operation for a vector with ElementCount \p VF.
/// For scalable vectors this currently takes the most pessimistic view based		/// For scalable vectors this currently takes the most pessimistic view based
/// upon the maximum possible value for vscale.		/// upon the maximum possible value for vscale.
unsigned getMaxNumElements(ElementCount VF,		unsigned getMaxNumElements(ElementCount VF,
const Function *F = nullptr) const {		const Function *F = nullptr) const {
if (!VF.isScalable())		if (!VF.isScalable())
return VF.getFixedValue();		return VF.getFixedValue();

unsigned MaxNumVScale = 16;		return VF.getKnownMinValue() * ST->getVScaleForTuning(F);
if (F && F->hasFnAttribute(Attribute::VScaleRange)) {
unsigned VScaleMax =
F->getFnAttribute(Attribute::VScaleRange).getVScaleRangeArgs().second;
if (VScaleMax > 0)
MaxNumVScale = VScaleMax;
}

return MaxNumVScale * VF.getKnownMinValue();
}		}

unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);

InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment, unsigned AddressSpace,		Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/sve-gather.ll

	; Check getIntrinsicInstrCost in BasicTTIImpl.h for masked gather			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; RUN: opt -analyze -cost-model < %s \| FileCheck %s
				; RUN: opt -analyze -cost-model -mcpu=neoverse-v1 < %s \| FileCheck %s --check-prefix=CHECK-CPU-NEOVERSE-V1
				; RUN: opt -analyze -cost-model -mcpu=neoverse-n2 < %s \| FileCheck %s --check-prefix=CHECK-CPU-NEOVERSE-N2

	; RUN: opt -cost-model -analyze -mtriple=aarch64--linux-gnu -mattr=+sve < %s \| FileCheck %s			target triple="aarch64--linux-gnu"

	define void @masked_gathers(<vscale x 4 x i1> %nxv4i1mask, <vscale x 8 x i1> %nxv8i1mask, <4 x i1> %v4i1mask, <1 x i1> %v1i1mask, <vscale x 1 x i1> %nxv1i1mask) vscale_range(0, 16) {			define void @masked_gathers(<vscale x 4 x i1> %nxv4i1mask, <vscale x 8 x i1> %nxv8i1mask, <4 x i1> %v4i1mask, <1 x i1> %v1i1mask, <vscale x 1 x i1> %nxv1i1mask) #0 {
	; CHECK-LABEL: 'masked_gathers'			; CHECK-LABEL: 'masked_gathers'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %res.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0i32			; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %res.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0i32(<vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %res.nxv8i32 = call <vscale x 8 x i32> @llvm.masked.gather.nxv8i32.nxv8p0i32			; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %res.nxv8i32 = call <vscale x 8 x i32> @llvm.masked.gather.nxv8i32.nxv8p0i32(<vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask, <vscale x 8 x i32> zeroinitializer)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %res.v4i32 = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32			; CHECK-NEXT: Cost Model: Invalid cost for instruction: %res.nxv1i64 = call <vscale x 1 x i64> @llvm.masked.gather.nxv1i64.nxv1p0i64(<vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask, <vscale x 1 x i64> zeroinitializer)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %res.v1i128 = call <1 x i128> @llvm.masked.gather.v1i128.v1p0i128			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	; CHECK-NEXT: Cost Model: Invalid cost for instruction: %res.nxv1i64 = call <vscale x 1 x i64> @llvm.masked.gather.nxv1i64.nxv1p0i64			;
				; CHECK-CPU-NEOVERSE-V1-LABEL: 'masked_gathers'
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0i32(<vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask, <vscale x 4 x i32> zeroinitializer)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %res.nxv8i32 = call <vscale x 8 x i32> @llvm.masked.gather.nxv8i32.nxv8p0i32(<vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask, <vscale x 8 x i32> zeroinitializer)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Invalid cost for instruction: %res.nxv1i64 = call <vscale x 1 x i64> @llvm.masked.gather.nxv1i64.nxv1p0i64(<vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask, <vscale x 1 x i64> zeroinitializer)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				; CHECK-CPU-NEOVERSE-N2-LABEL: 'masked_gathers'
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %res.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0i32(<vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask, <vscale x 4 x i32> zeroinitializer)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res.nxv8i32 = call <vscale x 8 x i32> @llvm.masked.gather.nxv8i32.nxv8p0i32(<vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask, <vscale x 8 x i32> zeroinitializer)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Invalid cost for instruction: %res.nxv1i64 = call <vscale x 1 x i64> @llvm.masked.gather.nxv1i64.nxv1p0i64(<vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask, <vscale x 1 x i64> zeroinitializer)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
	%res.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32(<vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask, <vscale x 4 x i32> zeroinitializer)			%res.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32(<vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask, <vscale x 4 x i32> zeroinitializer)
	%res.nxv8i32 = call <vscale x 8 x i32> @llvm.masked.gather.nxv8i32(<vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask, <vscale x 8 x i32> zeroinitializer)			%res.nxv8i32 = call <vscale x 8 x i32> @llvm.masked.gather.nxv8i32(<vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask, <vscale x 8 x i32> zeroinitializer)
	%res.v4i32 = call <4 x i32> @llvm.masked.gather.v4i32(<4 x i32*> undef, i32 0, <4 x i1> %v4i1mask, <4 x i32> zeroinitializer)			%res.nxv1i64 = call <vscale x 1 x i64> @llvm.masked.gather.nxv1i64(<vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask, <vscale x 1 x i64> zeroinitializer)
	%res.v1i128 = call <1 x i128> @llvm.masked.gather.v1i128.v1p0i128(<1 x i128*> undef, i32 0, <1 x i1> %v1i1mask, <1 x i128> zeroinitializer)
	%res.nxv1i64 = call <vscale x 1 x i64> @llvm.masked.gather.nxv1i64.nxv1p0i64(<vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask, <vscale x 1 x i64> zeroinitializer)
	ret void			ret void
	}			}

	define void @masked_gathers_no_vscale_range() {			define void @masked_gathers_tune_generic(<vscale x 4 x i1> %nxv4i1mask, <vscale x 8 x i1> %nxv8i1mask, <4 x i1> %v4i1mask, <1 x i1> %v1i1mask, <vscale x 1 x i1> %nxv1i1mask) #1 {
				; CHECK-LABEL: 'masked_gathers_tune_generic'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %res.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0i32(<vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %res.nxv8i32 = call <vscale x 8 x i32> @llvm.masked.gather.nxv8i32.nxv8p0i32(<vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask, <vscale x 8 x i32> zeroinitializer)
				; CHECK-NEXT: Cost Model: Invalid cost for instruction: %res.nxv1i64 = call <vscale x 1 x i64> @llvm.masked.gather.nxv1i64.nxv1p0i64(<vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask, <vscale x 1 x i64> zeroinitializer)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				; CHECK-CPU-NEOVERSE-V1-LABEL: 'masked_gathers_tune_generic'
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %res.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0i32(<vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask, <vscale x 4 x i32> zeroinitializer)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %res.nxv8i32 = call <vscale x 8 x i32> @llvm.masked.gather.nxv8i32.nxv8p0i32(<vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask, <vscale x 8 x i32> zeroinitializer)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Invalid cost for instruction: %res.nxv1i64 = call <vscale x 1 x i64> @llvm.masked.gather.nxv1i64.nxv1p0i64(<vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask, <vscale x 1 x i64> zeroinitializer)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				; CHECK-CPU-NEOVERSE-N2-LABEL: 'masked_gathers_tune_generic'
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %res.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0i32(<vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask, <vscale x 4 x i32> zeroinitializer)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %res.nxv8i32 = call <vscale x 8 x i32> @llvm.masked.gather.nxv8i32.nxv8p0i32(<vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask, <vscale x 8 x i32> zeroinitializer)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Invalid cost for instruction: %res.nxv1i64 = call <vscale x 1 x i64> @llvm.masked.gather.nxv1i64.nxv1p0i64(<vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask, <vscale x 1 x i64> zeroinitializer)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				%res.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32(<vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask, <vscale x 4 x i32> zeroinitializer)
				%res.nxv8i32 = call <vscale x 8 x i32> @llvm.masked.gather.nxv8i32(<vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask, <vscale x 8 x i32> zeroinitializer)
				%res.nxv1i64 = call <vscale x 1 x i64> @llvm.masked.gather.nxv1i64(<vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask, <vscale x 1 x i64> zeroinitializer)
				ret void
				}

				define void @masked_gathers_no_vscale_range() #2 {
	; CHECK-LABEL: 'masked_gathers_no_vscale_range'			; CHECK-LABEL: 'masked_gathers_no_vscale_range'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %res.nxv4f64 = call <vscale x 4 x double> @llvm.masked.gather.nxv4f64.nxv4p0f64(<vscale x 4 x double*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %res.nxv4f64 = call <vscale x 4 x double> @llvm.masked.gather.nxv4f64.nxv4p0f64(<vscale x 4 x double*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %res.nxv2f64 = call <vscale x 2 x double> @llvm.masked.gather.nxv2f64.nxv2p0f64(<vscale x 2 x double*> undef, i32 1, <vscale x 2 x i1> undef, <vscale x 2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %res.nxv2f64 = call <vscale x 2 x double> @llvm.masked.gather.nxv2f64.nxv2p0f64(<vscale x 2 x double*> undef, i32 1, <vscale x 2 x i1> undef, <vscale x 2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %res.nxv8f32 = call <vscale x 8 x float> @llvm.masked.gather.nxv8f32.nxv8p0f32(<vscale x 8 x float*> undef, i32 1, <vscale x 8 x i1> undef, <vscale x 8 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %res.nxv8f32 = call <vscale x 8 x float> @llvm.masked.gather.nxv8f32.nxv8p0f32(<vscale x 8 x float*> undef, i32 1, <vscale x 8 x i1> undef, <vscale x 8 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %res.nxv4f32 = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0f32(<vscale x 4 x float*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %res.nxv4f32 = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0f32(<vscale x 4 x float*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %res.nxv2f32 = call <vscale x 2 x float> @llvm.masked.gather.nxv2f32.nxv2p0f32(<vscale x 2 x float*> undef, i32 1, <vscale x 2 x i1> undef, <vscale x 2 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %res.nxv2f32 = call <vscale x 2 x float> @llvm.masked.gather.nxv2f32.nxv2p0f32(<vscale x 2 x float*> undef, i32 1, <vscale x 2 x i1> undef, <vscale x 2 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 256 for instruction: %res.nxv16i16 = call <vscale x 16 x i16> @llvm.masked.gather.nxv16i16.nxv16p0i16(<vscale x 16 x i16*> undef, i32 1, <vscale x 16 x i1> undef, <vscale x 16 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 256 for instruction: %res.nxv16i16 = call <vscale x 16 x i16> @llvm.masked.gather.nxv16i16.nxv16p0i16(<vscale x 16 x i16*> undef, i32 1, <vscale x 16 x i1> undef, <vscale x 16 x i16> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %res.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.gather.nxv8i16.nxv8p0i16(<vscale x 8 x i16*> undef, i32 1, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %res.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.gather.nxv8i16.nxv8p0i16(<vscale x 8 x i16*> undef, i32 1, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %res.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0i16(<vscale x 4 x i16*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %res.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0i16(<vscale x 4 x i16*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				; CHECK-CPU-NEOVERSE-V1-LABEL: 'masked_gathers_no_vscale_range'
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res.nxv4f64 = call <vscale x 4 x double> @llvm.masked.gather.nxv4f64.nxv4p0f64(<vscale x 4 x double*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x double> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %res.nxv2f64 = call <vscale x 2 x double> @llvm.masked.gather.nxv2f64.nxv2p0f64(<vscale x 2 x double*> undef, i32 1, <vscale x 2 x i1> undef, <vscale x 2 x double> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %res.nxv8f32 = call <vscale x 8 x float> @llvm.masked.gather.nxv8f32.nxv8p0f32(<vscale x 8 x float*> undef, i32 1, <vscale x 8 x i1> undef, <vscale x 8 x float> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res.nxv4f32 = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0f32(<vscale x 4 x float*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x float> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %res.nxv2f32 = call <vscale x 2 x float> @llvm.masked.gather.nxv2f32.nxv2p0f32(<vscale x 2 x float*> undef, i32 1, <vscale x 2 x i1> undef, <vscale x 2 x float> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %res.nxv16i16 = call <vscale x 16 x i16> @llvm.masked.gather.nxv16i16.nxv16p0i16(<vscale x 16 x i16*> undef, i32 1, <vscale x 16 x i1> undef, <vscale x 16 x i16> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %res.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.gather.nxv8i16.nxv8p0i16(<vscale x 8 x i16*> undef, i32 1, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0i16(<vscale x 4 x i16*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				; CHECK-CPU-NEOVERSE-N2-LABEL: 'masked_gathers_no_vscale_range'
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %res.nxv4f64 = call <vscale x 4 x double> @llvm.masked.gather.nxv4f64.nxv4p0f64(<vscale x 4 x double*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x double> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %res.nxv2f64 = call <vscale x 2 x double> @llvm.masked.gather.nxv2f64.nxv2p0f64(<vscale x 2 x double*> undef, i32 1, <vscale x 2 x i1> undef, <vscale x 2 x double> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res.nxv8f32 = call <vscale x 8 x float> @llvm.masked.gather.nxv8f32.nxv8p0f32(<vscale x 8 x float*> undef, i32 1, <vscale x 8 x i1> undef, <vscale x 8 x float> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %res.nxv4f32 = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0f32(<vscale x 4 x float*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x float> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %res.nxv2f32 = call <vscale x 2 x float> @llvm.masked.gather.nxv2f32.nxv2p0f32(<vscale x 2 x float*> undef, i32 1, <vscale x 2 x i1> undef, <vscale x 2 x float> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %res.nxv16i16 = call <vscale x 16 x i16> @llvm.masked.gather.nxv16i16.nxv16p0i16(<vscale x 16 x i16*> undef, i32 1, <vscale x 16 x i1> undef, <vscale x 16 x i16> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.gather.nxv8i16.nxv8p0i16(<vscale x 8 x i16*> undef, i32 1, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %res.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0i16(<vscale x 4 x i16*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
	%res.nxv4f64 = call <vscale x 4 x double> @llvm.masked.gather.nxv4f64(<vscale x 4 x double*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x double> undef)			%res.nxv4f64 = call <vscale x 4 x double> @llvm.masked.gather.nxv4f64(<vscale x 4 x double*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x double> undef)
	%res.nxv2f64 = call <vscale x 2 x double> @llvm.masked.gather.nxv2f64(<vscale x 2 x double*> undef, i32 1, <vscale x 2 x i1> undef, <vscale x 2 x double> undef)			%res.nxv2f64 = call <vscale x 2 x double> @llvm.masked.gather.nxv2f64(<vscale x 2 x double*> undef, i32 1, <vscale x 2 x i1> undef, <vscale x 2 x double> undef)

	%res.nxv8f32 = call <vscale x 8 x float> @llvm.masked.gather.nxv8f32(<vscale x 8 x float*> undef, i32 1, <vscale x 8 x i1> undef, <vscale x 8 x float> undef)			%res.nxv8f32 = call <vscale x 8 x float> @llvm.masked.gather.nxv8f32(<vscale x 8 x float*> undef, i32 1, <vscale x 8 x i1> undef, <vscale x 8 x float> undef)
	%res.nxv4f32 = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32(<vscale x 4 x float*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x float> undef)			%res.nxv4f32 = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32(<vscale x 4 x float*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x float> undef)
	%res.nxv2f32 = call <vscale x 2 x float> @llvm.masked.gather.nxv2f32(<vscale x 2 x float*> undef, i32 1, <vscale x 2 x i1> undef, <vscale x 2 x float> undef)			%res.nxv2f32 = call <vscale x 2 x float> @llvm.masked.gather.nxv2f32(<vscale x 2 x float*> undef, i32 1, <vscale x 2 x i1> undef, <vscale x 2 x float> undef)

	%res.nxv16i16 = call <vscale x 16 x i16> @llvm.masked.gather.nxv16i16(<vscale x 16 x i16*> undef, i32 1, <vscale x 16 x i1> undef, <vscale x 16 x i16> undef)			%res.nxv16i16 = call <vscale x 16 x i16> @llvm.masked.gather.nxv16i16(<vscale x 16 x i16*> undef, i32 1, <vscale x 16 x i1> undef, <vscale x 16 x i16> undef)
	%res.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.gather.nxv8i16(<vscale x 8 x i16*> undef, i32 1, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)			%res.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.gather.nxv8i16(<vscale x 8 x i16*> undef, i32 1, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
	%res.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16(<vscale x 4 x i16*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)			%res.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16(<vscale x 4 x i16*> undef, i32 1, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)

	ret void			ret void
	}			}

				attributes #0 = { "target-features"="+sve" vscale_range(0, 8) }
				attributes #1 = { "target-features"="+sve" vscale_range(0, 16) "tune-cpu"="generic" }
				attributes #2 = { "target-features"="+sve" }

	declare <vscale x 4 x i32> @llvm.masked.gather.nxv4i32(<vscale x 4 x i32*>, i32, <vscale x 4 x i1>, <vscale x 4 x i32>)			declare <vscale x 4 x i32> @llvm.masked.gather.nxv4i32(<vscale x 4 x i32*>, i32, <vscale x 4 x i1>, <vscale x 4 x i32>)
	declare <vscale x 8 x i32> @llvm.masked.gather.nxv8i32(<vscale x 8 x i32*>, i32, <vscale x 8 x i1>, <vscale x 8 x i32>)			declare <vscale x 8 x i32> @llvm.masked.gather.nxv8i32(<vscale x 8 x i32*>, i32, <vscale x 8 x i1>, <vscale x 8 x i32>)
	declare <4 x i32> @llvm.masked.gather.v4i32(<4 x i32*>, i32, <4 x i1>, <4 x i32>)			declare <vscale x 1 x i64> @llvm.masked.gather.nxv1i64(<vscale x 1 x i64*>, i32, <vscale x 1 x i1>, <vscale x 1 x i64>)
	declare <1 x i128> @llvm.masked.gather.v1i128.v1p0i128(<1 x i128*>, i32, <1 x i1>, <1 x i128>)
	declare <vscale x 1 x i64> @llvm.masked.gather.nxv1i64.nxv1p0i64(<vscale x 1 x i64*>, i32, <vscale x 1 x i1>, <vscale x 1 x i64>)
	declare <vscale x 4 x double> @llvm.masked.gather.nxv4f64(<vscale x 4 x double*>, i32, <vscale x 4 x i1>, <vscale x 4 x double>)			declare <vscale x 4 x double> @llvm.masked.gather.nxv4f64(<vscale x 4 x double*>, i32, <vscale x 4 x i1>, <vscale x 4 x double>)
	declare <vscale x 2 x double> @llvm.masked.gather.nxv2f64(<vscale x 2 x double*>, i32, <vscale x 2 x i1>, <vscale x 2 x double>)			declare <vscale x 2 x double> @llvm.masked.gather.nxv2f64(<vscale x 2 x double*>, i32, <vscale x 2 x i1>, <vscale x 2 x double>)
	declare <vscale x 8 x float> @llvm.masked.gather.nxv8f32(<vscale x 8 x float*>, i32, <vscale x 8 x i1>, <vscale x 8 x float>)			declare <vscale x 8 x float> @llvm.masked.gather.nxv8f32(<vscale x 8 x float*>, i32, <vscale x 8 x i1>, <vscale x 8 x float>)
	declare <vscale x 4 x float> @llvm.masked.gather.nxv4f32(<vscale x 4 x float*>, i32, <vscale x 4 x i1>, <vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.masked.gather.nxv4f32(<vscale x 4 x float*>, i32, <vscale x 4 x i1>, <vscale x 4 x float>)
	declare <vscale x 2 x float> @llvm.masked.gather.nxv2f32(<vscale x 2 x float*>, i32, <vscale x 2 x i1>, <vscale x 2 x float>)			declare <vscale x 2 x float> @llvm.masked.gather.nxv2f32(<vscale x 2 x float*>, i32, <vscale x 2 x i1>, <vscale x 2 x float>)
	declare <vscale x 16 x i16> @llvm.masked.gather.nxv16i16(<vscale x 16 x i16*>, i32, <vscale x 16 x i1>, <vscale x 16 x i16>)			declare <vscale x 16 x i16> @llvm.masked.gather.nxv16i16(<vscale x 16 x i16*>, i32, <vscale x 16 x i1>, <vscale x 16 x i16>)
	declare <vscale x 8 x i16> @llvm.masked.gather.nxv8i16(<vscale x 8 x i16*>, i32, <vscale x 8 x i1>, <vscale x 8 x i16>)			declare <vscale x 8 x i16> @llvm.masked.gather.nxv8i16(<vscale x 8 x i16*>, i32, <vscale x 8 x i1>, <vscale x 8 x i16>)
	declare <vscale x 4 x i16> @llvm.masked.gather.nxv4i16(<vscale x 4 x i16*>, i32, <vscale x 4 x i1>, <vscale x 4 x i16>)			declare <vscale x 4 x i16> @llvm.masked.gather.nxv4i16(<vscale x 4 x i16*>, i32, <vscale x 4 x i1>, <vscale x 4 x i16>)

llvm/test/Analysis/CostModel/AArch64/sve-scatter.ll

	; Check getIntrinsicInstrCost in BasicTTIImpl.h with for masked scatter			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; RUN: opt -analyze -cost-model < %s \| FileCheck %s
				; RUN: opt -analyze -cost-model -mcpu=neoverse-v1 < %s \| FileCheck %s --check-prefix=CHECK-CPU-NEOVERSE-V1
				; RUN: opt -analyze -cost-model -mcpu=neoverse-n2 < %s \| FileCheck %s --check-prefix=CHECK-CPU-NEOVERSE-N2

	; RUN: opt -cost-model -analyze -mtriple=aarch64--linux-gnu -mattr=+sve < %s \| FileCheck %s			target triple="aarch64--linux-gnu"

	define void @masked_scatters(<vscale x 4 x i1> %nxv4i1mask, <vscale x 8 x i1> %nxv8i1mask, <4 x i1> %v4i1mask, <1 x i1> %v1i1mask, <vscale x 1 x i1> %nxv1i1mask) vscale_range(0, 16) {			define void @masked_scatters(<vscale x 4 x i1> %nxv4i1mask, <vscale x 8 x i1> %nxv8i1mask, <4 x i1> %v4i1mask, <1 x i1> %v1i1mask, <vscale x 1 x i1> %nxv1i1mask) #0 {
	; CHECK-LABEL: 'masked_scatters'			; CHECK-LABEL: 'masked_scatters'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32			; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32(<vscale x 4 x i32> undef, <vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: call void @llvm.masked.scatter.nxv8i32.nxv8p0i32			; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: call void @llvm.masked.scatter.nxv8i32.nxv8p0i32(<vscale x 8 x i32> undef, <vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.scatter.v4i32.v4p0i32			; CHECK-NEXT: Cost Model: Invalid cost for instruction: call void @llvm.masked.scatter.nxv1i64.nxv1p0i64(<vscale x 1 x i64> undef, <vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: call void @llvm.masked.scatter.v1i128.v1p0i128			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	; CHECK-NEXT: Cost Model: Invalid cost for instruction: call void @llvm.masked.scatter.nxv1i64.nxv1p0i64			;
				; CHECK-CPU-NEOVERSE-V1-LABEL: 'masked_scatters'
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32(<vscale x 4 x i32> undef, <vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.scatter.nxv8i32.nxv8p0i32(<vscale x 8 x i32> undef, <vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Invalid cost for instruction: call void @llvm.masked.scatter.nxv1i64.nxv1p0i64(<vscale x 1 x i64> undef, <vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				; CHECK-CPU-NEOVERSE-N2-LABEL: 'masked_scatters'
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32(<vscale x 4 x i32> undef, <vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.scatter.nxv8i32.nxv8p0i32(<vscale x 8 x i32> undef, <vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Invalid cost for instruction: call void @llvm.masked.scatter.nxv1i64.nxv1p0i64(<vscale x 1 x i64> undef, <vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
	call void @llvm.masked.scatter.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask)			call void @llvm.masked.scatter.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask)
	call void @llvm.masked.scatter.nxv8i32(<vscale x 8 x i32> undef, <vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask)			call void @llvm.masked.scatter.nxv8i32(<vscale x 8 x i32> undef, <vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask)
	call void @llvm.masked.scatter.v4i32(<4 x i32> undef, <4 x i32*> undef, i32 0, <4 x i1> %v4i1mask)			call void @llvm.masked.scatter.nxv1i64(<vscale x 1 x i64> undef, <vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask)
	call void @llvm.masked.scatter.v1i128.v1p0i128(<1 x i128> undef, <1 x i128*> undef, i32 0, <1 x i1> %v1i1mask)
	call void @llvm.masked.scatter.nxv1i64.nxv1p0i64(<vscale x 1 x i64> undef, <vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask)
	ret void			ret void
	}			}

	define void @masked_scatters_no_vscale_range() {			define void @masked_scatters_tune_generic(<vscale x 4 x i1> %nxv4i1mask, <vscale x 8 x i1> %nxv8i1mask, <4 x i1> %v4i1mask, <1 x i1> %v1i1mask, <vscale x 1 x i1> %nxv1i1mask) #1 {
				; CHECK-LABEL: 'masked_scatters_tune_generic'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32(<vscale x 4 x i32> undef, <vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.scatter.nxv8i32.nxv8p0i32(<vscale x 8 x i32> undef, <vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask)
				; CHECK-NEXT: Cost Model: Invalid cost for instruction: call void @llvm.masked.scatter.nxv1i64.nxv1p0i64(<vscale x 1 x i64> undef, <vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				; CHECK-CPU-NEOVERSE-V1-LABEL: 'masked_scatters_tune_generic'
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32(<vscale x 4 x i32> undef, <vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.scatter.nxv8i32.nxv8p0i32(<vscale x 8 x i32> undef, <vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Invalid cost for instruction: call void @llvm.masked.scatter.nxv1i64.nxv1p0i64(<vscale x 1 x i64> undef, <vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				; CHECK-CPU-NEOVERSE-N2-LABEL: 'masked_scatters_tune_generic'
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32(<vscale x 4 x i32> undef, <vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.scatter.nxv8i32.nxv8p0i32(<vscale x 8 x i32> undef, <vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Invalid cost for instruction: call void @llvm.masked.scatter.nxv1i64.nxv1p0i64(<vscale x 1 x i64> undef, <vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				call void @llvm.masked.scatter.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i32*> undef, i32 0, <vscale x 4 x i1> %nxv4i1mask)
				call void @llvm.masked.scatter.nxv8i32(<vscale x 8 x i32> undef, <vscale x 8 x i32*> undef, i32 0, <vscale x 8 x i1> %nxv8i1mask)
				call void @llvm.masked.scatter.nxv1i64(<vscale x 1 x i64> undef, <vscale x 1 x i64*> undef, i32 0, <vscale x 1 x i1> %nxv1i1mask)
				ret void
				}

				define void @masked_scatters_no_vscale_range() #2 {
	; CHECK-LABEL: 'masked_scatters_no_vscale_range'			; CHECK-LABEL: 'masked_scatters_no_vscale_range'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: call void @llvm.masked.scatter.nxv4f64.nxv4p0f64(<vscale x 4 x double> undef, <vscale x 4 x double*> undef, i32 1, <vscale x 4 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: call void @llvm.masked.scatter.nxv4f64.nxv4p0f64(<vscale x 4 x double> undef, <vscale x 4 x double*> undef, i32 1, <vscale x 4 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.scatter.nxv2f64.nxv2p0f64(<vscale x 2 x double> undef, <vscale x 2 x double*> undef, i32 1, <vscale x 2 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.scatter.nxv2f64.nxv2p0f64(<vscale x 2 x double> undef, <vscale x 2 x double*> undef, i32 1, <vscale x 2 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: call void @llvm.masked.scatter.nxv8f32.nxv8p0f32(<vscale x 8 x float> undef, <vscale x 8 x float*> undef, i32 1, <vscale x 8 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: call void @llvm.masked.scatter.nxv8f32.nxv8p0f32(<vscale x 8 x float> undef, <vscale x 8 x float*> undef, i32 1, <vscale x 8 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: call void @llvm.masked.scatter.nxv4f32.nxv4p0f32(<vscale x 4 x float> undef, <vscale x 4 x float*> undef, i32 1, <vscale x 4 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: call void @llvm.masked.scatter.nxv4f32.nxv4p0f32(<vscale x 4 x float> undef, <vscale x 4 x float*> undef, i32 1, <vscale x 4 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.scatter.nxv2f32.nxv2p0f32(<vscale x 2 x float> undef, <vscale x 2 x float*> undef, i32 1, <vscale x 2 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.scatter.nxv2f32.nxv2p0f32(<vscale x 2 x float> undef, <vscale x 2 x float*> undef, i32 1, <vscale x 2 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 256 for instruction: call void @llvm.masked.scatter.nxv16i16.nxv16p0i16(<vscale x 16 x i16> undef, <vscale x 16 x i16*> undef, i32 1, <vscale x 16 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 256 for instruction: call void @llvm.masked.scatter.nxv16i16.nxv16p0i16(<vscale x 16 x i16> undef, <vscale x 16 x i16*> undef, i32 1, <vscale x 16 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: call void @llvm.masked.scatter.nxv8i16.nxv8p0i16(<vscale x 8 x i16> undef, <vscale x 8 x i16*> undef, i32 1, <vscale x 8 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: call void @llvm.masked.scatter.nxv8i16.nxv8p0i16(<vscale x 8 x i16> undef, <vscale x 8 x i16*> undef, i32 1, <vscale x 8 x i1> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: call void @llvm.masked.scatter.nxv4i16.nxv4p0i16(<vscale x 4 x i16> undef, <vscale x 4 x i16*> undef, i32 1, <vscale x 4 x i1> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 64 for instruction: call void @llvm.masked.scatter.nxv4i16.nxv4p0i16(<vscale x 4 x i16> undef, <vscale x 4 x i16*> undef, i32 1, <vscale x 4 x i1> undef)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				; CHECK-CPU-NEOVERSE-V1-LABEL: 'masked_scatters_no_vscale_range'
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.scatter.nxv4f64.nxv4p0f64(<vscale x 4 x double> undef, <vscale x 4 x double*> undef, i32 1, <vscale x 4 x i1> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.masked.scatter.nxv2f64.nxv2p0f64(<vscale x 2 x double> undef, <vscale x 2 x double*> undef, i32 1, <vscale x 2 x i1> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.scatter.nxv8f32.nxv8p0f32(<vscale x 8 x float> undef, <vscale x 8 x float*> undef, i32 1, <vscale x 8 x i1> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.scatter.nxv4f32.nxv4p0f32(<vscale x 4 x float> undef, <vscale x 4 x float*> undef, i32 1, <vscale x 4 x i1> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.masked.scatter.nxv2f32.nxv2p0f32(<vscale x 2 x float> undef, <vscale x 2 x float*> undef, i32 1, <vscale x 2 x i1> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.scatter.nxv16i16.nxv16p0i16(<vscale x 16 x i16> undef, <vscale x 16 x i16*> undef, i32 1, <vscale x 16 x i1> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.scatter.nxv8i16.nxv8p0i16(<vscale x 8 x i16> undef, <vscale x 8 x i16*> undef, i32 1, <vscale x 8 x i1> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.scatter.nxv4i16.nxv4p0i16(<vscale x 4 x i16> undef, <vscale x 4 x i16*> undef, i32 1, <vscale x 4 x i1> undef)
				; CHECK-CPU-NEOVERSE-V1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				; CHECK-CPU-NEOVERSE-N2-LABEL: 'masked_scatters_no_vscale_range'
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.masked.scatter.nxv4f64.nxv4p0f64(<vscale x 4 x double> undef, <vscale x 4 x double*> undef, i32 1, <vscale x 4 x i1> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.masked.scatter.nxv2f64.nxv2p0f64(<vscale x 2 x double> undef, <vscale x 2 x double*> undef, i32 1, <vscale x 2 x i1> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.scatter.nxv8f32.nxv8p0f32(<vscale x 8 x float> undef, <vscale x 8 x float*> undef, i32 1, <vscale x 8 x i1> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.masked.scatter.nxv4f32.nxv4p0f32(<vscale x 4 x float> undef, <vscale x 4 x float*> undef, i32 1, <vscale x 4 x i1> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.masked.scatter.nxv2f32.nxv2p0f32(<vscale x 2 x float> undef, <vscale x 2 x float*> undef, i32 1, <vscale x 2 x i1> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.scatter.nxv16i16.nxv16p0i16(<vscale x 16 x i16> undef, <vscale x 16 x i16*> undef, i32 1, <vscale x 16 x i1> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.scatter.nxv8i16.nxv8p0i16(<vscale x 8 x i16> undef, <vscale x 8 x i16*> undef, i32 1, <vscale x 8 x i1> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.masked.scatter.nxv4i16.nxv4p0i16(<vscale x 4 x i16> undef, <vscale x 4 x i16*> undef, i32 1, <vscale x 4 x i1> undef)
				; CHECK-CPU-NEOVERSE-N2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
	call void @llvm.masked.scatter.nxv4f64(<vscale x 4 x double> undef, <vscale x 4 x double*> undef, i32 1, <vscale x 4 x i1> undef)			call void @llvm.masked.scatter.nxv4f64(<vscale x 4 x double> undef, <vscale x 4 x double*> undef, i32 1, <vscale x 4 x i1> undef)
	call void @llvm.masked.scatter.nxv2f64(<vscale x 2 x double> undef, <vscale x 2 x double*> undef, i32 1, <vscale x 2 x i1> undef)			call void @llvm.masked.scatter.nxv2f64(<vscale x 2 x double> undef, <vscale x 2 x double*> undef, i32 1, <vscale x 2 x i1> undef)

	call void @llvm.masked.scatter.nxv8f32(<vscale x 8 x float> undef, <vscale x 8 x float*> undef, i32 1, <vscale x 8 x i1> undef)			call void @llvm.masked.scatter.nxv8f32(<vscale x 8 x float> undef, <vscale x 8 x float*> undef, i32 1, <vscale x 8 x i1> undef)
	call void @llvm.masked.scatter.nxv4f32(<vscale x 4 x float> undef, <vscale x 4 x float*> undef, i32 1, <vscale x 4 x i1> undef)			call void @llvm.masked.scatter.nxv4f32(<vscale x 4 x float> undef, <vscale x 4 x float*> undef, i32 1, <vscale x 4 x i1> undef)
	call void @llvm.masked.scatter.nxv2f32(<vscale x 2 x float> undef, <vscale x 2 x float*> undef, i32 1, <vscale x 2 x i1> undef)			call void @llvm.masked.scatter.nxv2f32(<vscale x 2 x float> undef, <vscale x 2 x float*> undef, i32 1, <vscale x 2 x i1> undef)

	call void @llvm.masked.scatter.nxv16i16(<vscale x 16 x i16> undef, <vscale x 16 x i16*> undef, i32 1, <vscale x 16 x i1> undef)			call void @llvm.masked.scatter.nxv16i16(<vscale x 16 x i16> undef, <vscale x 16 x i16*> undef, i32 1, <vscale x 16 x i1> undef)
	call void @llvm.masked.scatter.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i16*> undef, i32 1, <vscale x 8 x i1> undef)			call void @llvm.masked.scatter.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i16*> undef, i32 1, <vscale x 8 x i1> undef)
	call void @llvm.masked.scatter.nxv4i16(<vscale x 4 x i16> undef, <vscale x 4 x i16*> undef, i32 1, <vscale x 4 x i1> undef)			call void @llvm.masked.scatter.nxv4i16(<vscale x 4 x i16> undef, <vscale x 4 x i16*> undef, i32 1, <vscale x 4 x i1> undef)

	ret void			ret void
	}			}

				attributes #0 = { "target-features"="+sve" vscale_range(0, 8) }
				attributes #1 = { "target-features"="+sve" vscale_range(0, 16) "tune-cpu"="generic" }
				attributes #2 = { "target-features"="+sve" }

	declare void @llvm.masked.scatter.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32*>, i32, <vscale x 4 x i1>)			declare void @llvm.masked.scatter.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32*>, i32, <vscale x 4 x i1>)
	declare void @llvm.masked.scatter.nxv8i32(<vscale x 8 x i32>, <vscale x 8 x i32*>, i32, <vscale x 8 x i1>)			declare void @llvm.masked.scatter.nxv8i32(<vscale x 8 x i32>, <vscale x 8 x i32*>, i32, <vscale x 8 x i1>)
	declare void @llvm.masked.scatter.v4i32(<4 x i32>, <4 x i32*>, i32, <4 x i1>)			declare void @llvm.masked.scatter.nxv1i64(<vscale x 1 x i64>, <vscale x 1 x i64*>, i32, <vscale x 1 x i1>)
	declare void @llvm.masked.scatter.v1i128.v1p0i128(<1 x i128>, <1 x i128*>, i32, <1 x i1>)
	declare void @llvm.masked.scatter.nxv1i64.nxv1p0i64(<vscale x 1 x i64>, <vscale x 1 x i64*>, i32, <vscale x 1 x i1>)
	declare void @llvm.masked.scatter.nxv4f64(<vscale x 4 x double>, <vscale x 4 x double*>, i32, <vscale x 4 x i1>)			declare void @llvm.masked.scatter.nxv4f64(<vscale x 4 x double>, <vscale x 4 x double*>, i32, <vscale x 4 x i1>)
	declare void @llvm.masked.scatter.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double*>, i32, <vscale x 2 x i1>)			declare void @llvm.masked.scatter.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double*>, i32, <vscale x 2 x i1>)
	declare void @llvm.masked.scatter.nxv8f32(<vscale x 8 x float>, <vscale x 8 x float*>, i32, <vscale x 8 x i1>)			declare void @llvm.masked.scatter.nxv8f32(<vscale x 8 x float>, <vscale x 8 x float*>, i32, <vscale x 8 x i1>)
	declare void @llvm.masked.scatter.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float*>, i32, <vscale x 4 x i1>)			declare void @llvm.masked.scatter.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float*>, i32, <vscale x 4 x i1>)
	declare void @llvm.masked.scatter.nxv2f32(<vscale x 2 x float>, <vscale x 2 x float*>, i32, <vscale x 2 x i1>)			declare void @llvm.masked.scatter.nxv2f32(<vscale x 2 x float>, <vscale x 2 x float*>, i32, <vscale x 2 x i1>)
	declare void @llvm.masked.scatter.nxv16i16(<vscale x 16 x i16>, <vscale x 16 x i16*>, i32, <vscale x 16 x i1>)			declare void @llvm.masked.scatter.nxv16i16(<vscale x 16 x i16>, <vscale x 16 x i16*>, i32, <vscale x 16 x i1>)
	declare void @llvm.masked.scatter.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16*>, i32, <vscale x 8 x i1>)			declare void @llvm.masked.scatter.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16*>, i32, <vscale x 8 x i1>)
	declare void @llvm.masked.scatter.nxv4i16(<vscale x 4 x i16>, <vscale x 4 x i16*>, i32, <vscale x 4 x i1>)			declare void @llvm.masked.scatter.nxv4i16(<vscale x 4 x i16>, <vscale x 4 x i16*>, i32, <vscale x 4 x i1>)

This is an archive of the discontinued LLVM Phabricator instance.

[SVE][Analysis] Tune the cost model according to the tune-cpu attribute
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 374241

llvm/lib/Target/AArch64/AArch64Subtarget.h

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/test/Analysis/CostModel/AArch64/sve-gather.ll

llvm/test/Analysis/CostModel/AArch64/sve-scatter.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SVE][Analysis] Tune the cost model according to the tune-cpu attributeClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 374241

llvm/lib/Target/AArch64/AArch64Subtarget.h

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/test/Analysis/CostModel/AArch64/sve-gather.ll

llvm/test/Analysis/CostModel/AArch64/sve-scatter.ll

[SVE][Analysis] Tune the cost model according to the tune-cpu attribute
ClosedPublic