Diff 515254

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,825 Lines • ▼ Show 20 Lines	InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
}		}

// TODO: Allow non-throughput costs that aren't binary.		// TODO: Allow non-throughput costs that aren't binary.
auto AdjustCost = [&CostKind](InstructionCost Cost) -> InstructionCost {		auto AdjustCost = [&CostKind](InstructionCost Cost) -> InstructionCost {
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return Cost == 0 ? 0 : 1;		return Cost == 0 ? 0 : 1;
return Cost;		return Cost;
};		};

		sdesmalenUnsubmitted Done Reply Inline Actions nit: this can be `hasSVEorSME` sdesmalen: nit: this can be `hasSVEorSME`
EVT SrcTy = TLI->getValueType(DL, Src);		EVT SrcTy = TLI->getValueType(DL, Src);
EVT DstTy = TLI->getValueType(DL, Dst);		EVT DstTy = TLI->getValueType(DL, Dst);

if (!SrcTy.isSimple() \|\| !DstTy.isSimple())		if (!SrcTy.isSimple() \|\| !DstTy.isSimple())
		sdesmalenUnsubmitted Done Reply Inline Actions When the target has SVE and the CCH is masked, is it worth just calling `getCastInstrCost` again but instead passing `CastContextHint::Normal` ? sdesmalen: When the target has SVE and the CCH is masked, is it worth just calling `getCastInstrCost`…
		david-armAuthorUnsubmitted Done Reply Inline Actions Good suggestion, thanks @sdesmalen ! david-arm: Good suggestion, thanks @sdesmalen !
return AdjustCost(		return AdjustCost(
BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));		BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));

static const TypeConversionCostTblEntry		static const TypeConversionCostTblEntry
ConversionTbl[] = {		ConversionTbl[] = {
{ ISD::TRUNCATE, MVT::v2i8, MVT::v2i64, 1}, // xtn		{ ISD::TRUNCATE, MVT::v2i8, MVT::v2i64, 1}, // xtn
{ ISD::TRUNCATE, MVT::v2i16, MVT::v2i64, 1}, // xtn		{ ISD::TRUNCATE, MVT::v2i16, MVT::v2i64, 1}, // xtn
{ ISD::TRUNCATE, MVT::v2i32, MVT::v2i64, 1}, // xtn		{ ISD::TRUNCATE, MVT::v2i32, MVT::v2i64, 1}, // xtn
▲ Show 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	static const TypeConversionCostTblEntry FP16Tbl[] = {
{ISD::SINT_TO_FP, MVT::v16f16, MVT::v16i8, 4}, // 2 * sshl(2) + 2 * scvtf		{ISD::SINT_TO_FP, MVT::v16f16, MVT::v16i8, 4}, // 2 * sshl(2) + 2 * scvtf
};		};

if (ST->hasFullFP16())		if (ST->hasFullFP16())
if (const auto *Entry = ConvertCostTableLookup(		if (const auto *Entry = ConvertCostTableLookup(
FP16Tbl, ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT()))		FP16Tbl, ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT()))
return AdjustCost(Entry->Cost);		return AdjustCost(Entry->Cost);

		// The BasicTTIImpl version only deals with CCH==TTI::CastContextHint::Normal,
		// but we also want to include the TTI::CastContextHint::Masked case too.
		if ((ISD == ISD::ZERO_EXTEND \|\| ISD == ISD::SIGN_EXTEND) &&
		CCH == TTI::CastContextHint::Masked && ST->hasSVEorSME() &&
		dmgreenUnsubmitted Not Done Reply Inline Actions You can drop the brackets around `CCH == TTI::CastContextHint::Masked` dmgreen: You can drop the brackets around `CCH == TTI::CastContextHint::Masked`
		TLI->isTypeLegal(DstTy))
		dmgreenUnsubmitted Done Reply Inline Actions Is this always true that they are equivalent? For a `nxv8i16 load zext to nxv8i32` (without masking) you can convert it into a pair of extending load (each cost 1, so load+zext costs 2 total). The same can't be done for `nxv8i16 masked_load zext to nxv8i32` without either converting the nxv8i1 mask to two nxv4i1 masks, or zext a single load with a pair of uunpk's. (For MVE both are expensive so we give the instruction a high cost, preferring lower vector factors). dmgreen: Is this always true that they are equivalent? For a `nxv8i16 load zext to nxv8i32` (without…
		david-armAuthorUnsubmitted Done Reply Inline Actions Good point! I think the new version should fix that. david-arm: Good point! I think the new version should fix that.
		david-armAuthorUnsubmitted Done Reply Inline Actions This is in BasicTTIImpl::getCastInstrCost: // If we are legalizing by splitting, query the concrete TTI for the cost // of casting the original vector twice. We also need to factor in the // cost of the split itself. Count that as 1, to be consistent with // getTypeLegalizationCost(). bool SplitSrc = TLI->getTypeAction(Src->getContext(), TLI->getValueType(DL, Src)) == TargetLowering::TypeSplitVector; bool SplitDst = TLI->getTypeAction(Dst->getContext(), TLI->getValueType(DL, Dst)) == TargetLowering::TypeSplitVector; if ((SplitSrc \|\| SplitDst) && SrcVTy->getElementCount().isVector() && DstVTy->getElementCount().isVector()) { Type SplitDstTy = VectorType::getHalfElementsVectorType(DstVTy); Type SplitSrcTy = VectorType::getHalfElementsVectorType(SrcVTy); T TTI = static_cast<T >(this); // If both types need to be split then the split is free. InstructionCost SplitCost = (!SplitSrc \|\| !SplitDst) ? TTI->getVectorSplitCost() : 0; return SplitCost + (2 * TTI->getCastInstrCost(Opcode, SplitDstTy, SplitSrcTy, CCH, CostKind, I)); } which explains what's happening. It splits the types, then recalculates the zext/sext when the dest is a legal type. It just so happens that this becomes a legal extending load, which is correct! So the extend is absorbed into each load and becomes free. The only cost is then the `SplitCost`, which I guess could account for the additional load required. david-arm: This is in BasicTTIImpl::getCastInstrCost: ``` // If we are legalizing by splitting…
		CCH = TTI::CastContextHint::Normal;

return AdjustCost(		return AdjustCost(
BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));		BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));
}		}

InstructionCost AArch64TTIImpl::getExtractWithExtendCost(unsigned Opcode,		InstructionCost AArch64TTIImpl::getExtractWithExtendCost(unsigned Opcode,
Type *Dst,		Type *Dst,
VectorType *VecTy,		VectorType *VecTy,
unsigned Index) {		unsigned Index) {
▲ Show 20 Lines • Show All 1,339 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	entry:

ret void		ret void
}		}


define void @scalable_ext_loads() {		define void @scalable_ext_loads() {
; CHECK-LABEL: 'scalable_ext_loads'		; CHECK-LABEL: 'scalable_ext_loads'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv16i8 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv16i8 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %zext.nxv16i8to16 = zext <vscale x 16 x i8> %load.nxv16i8 to <vscale x 16 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %zext.nxv16i8to16 = zext <vscale x 16 x i8> %load.nxv16i8 to <vscale x 16 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv16i8.2 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv16i8.2 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %zext.nxv16i8to32 = zext <vscale x 16 x i8> %load.nxv16i8.2 to <vscale x 16 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %zext.nxv16i8to32 = zext <vscale x 16 x i8> %load.nxv16i8.2 to <vscale x 16 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv16i8.3 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv16i8.3 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %zext.nxv16i8to64 = zext <vscale x 16 x i8> %load.nxv16i8.3 to <vscale x 16 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %zext.nxv16i8to64 = zext <vscale x 16 x i8> %load.nxv16i8.3 to <vscale x 16 x i64>
		dmgreenUnsubmitted Not Done Reply Inline Actions Do you know why this has gone from 3 to 1, if the Dst type is not legal? I think I would expect the score to be 2! dmgreen: Do you know why this has gone from 3 to 1, if the Dst type is not legal? I think I would…
		david-armAuthorUnsubmitted Done Reply Inline Actions I assumed this was because it was deciding to create two legal extending loads and then stitch the results together, i.e. reinvoke getCastInstrCost with the type split in half. %load.nxv88.1 = call <vscale x 16 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef) %zext.nxv16i8to16.1 = zext <vscale x 8 x i8> %load.nxv8i8.1 to <vscale x 8 x i16> %load.nxv8i8.2 = call <vscale x 16 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef) %zext.nxv16i8to16.2 = zext <vscale x 8 x i8> %load.nxv8i8.2 to <vscale x 8 x i16> %zext.nxv16i8to16 = concat (%load.nxv8i8.1, %load.nxv8i8.2) so that the only cost of the zext is just the concat, but it's a good question. I can double check. david-arm: I assumed this was because it was deciding to create two legal extending loads and then stitch…
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %zext.nxv8i8to16 = zext <vscale x 8 x i8> %load.nxv8i8 to <vscale x 8 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %zext.nxv8i8to16 = zext <vscale x 8 x i8> %load.nxv8i8 to <vscale x 8 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %zext.nxv4i8to32 = zext <vscale x 4 x i8> %load.nxv4i8 to <vscale x 4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %zext.nxv4i8to32 = zext <vscale x 4 x i8> %load.nxv4i8 to <vscale x 4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %zext.nxv2i8to64 = zext <vscale x 2 x i8> %load.nxv2i8 to <vscale x 2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %zext.nxv2i8to64 = zext <vscale x 2 x i8> %load.nxv2i8 to <vscale x 2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %zext.nxv8i16to32 = zext <vscale x 8 x i16> %load.nxv8i16 to <vscale x 8 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %zext.nxv8i16to32 = zext <vscale x 8 x i16> %load.nxv8i16 to <vscale x 8 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv8i16.2 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv8i16.2 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %zext.nxv8i16to64 = zext <vscale x 8 x i16> %load.nxv8i16.2 to <vscale x 8 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %zext.nxv8i16to64 = zext <vscale x 8 x i16> %load.nxv8i16.2 to <vscale x 8 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %zext.nxv4i16to32 = zext <vscale x 4 x i16> %load.nxv4i16 to <vscale x 4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %zext.nxv4i16to32 = zext <vscale x 4 x i16> %load.nxv4i16 to <vscale x 4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %zext.nxv2i16to64 = zext <vscale x 2 x i16> %load.nxv2i16 to <vscale x 2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %zext.nxv2i16to64 = zext <vscale x 2 x i16> %load.nxv2i16 to <vscale x 2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i32> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i32> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %zext.nxv4i32to64 = zext <vscale x 4 x i32> %load.nxv4i32 to <vscale x 4 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %zext.nxv4i32to64 = zext <vscale x 4 x i32> %load.nxv4i32 to <vscale x 4 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv2i32 = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i32> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv2i32 = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i32> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %zext.nxv2i32to64 = zext <vscale x 2 x i32> %load.nxv2i32 to <vscale x 2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %zext.nxv2i32to64 = zext <vscale x 2 x i32> %load.nxv2i32 to <vscale x 2 x i64>

; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv16i8 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv16i8 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext.nxv16i8to16 = sext <vscale x 16 x i8> %load2.nxv16i8 to <vscale x 16 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext.nxv16i8to16 = sext <vscale x 16 x i8> %load2.nxv16i8 to <vscale x 16 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv16i8.2 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv16i8.2 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %sext.nxv16i8to32 = sext <vscale x 16 x i8> %load2.nxv16i8.2 to <vscale x 16 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %sext.nxv16i8to32 = sext <vscale x 16 x i8> %load2.nxv16i8.2 to <vscale x 16 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv16i8.3 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv16i8.3 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %sext.nxv16i8to64 = sext <vscale x 16 x i8> %load2.nxv16i8.3 to <vscale x 16 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %sext.nxv16i8to64 = sext <vscale x 16 x i8> %load2.nxv16i8.3 to <vscale x 16 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %sext.nxv8i8to16 = sext <vscale x 8 x i8> %load2.nxv8i8 to <vscale x 8 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %sext.nxv8i8to16 = sext <vscale x 8 x i8> %load2.nxv8i8 to <vscale x 8 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %sext.nxv4i8to32 = sext <vscale x 4 x i8> %load2.nxv4i8 to <vscale x 4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %sext.nxv4i8to32 = sext <vscale x 4 x i8> %load2.nxv4i8 to <vscale x 4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %sext.nxv2i8to64 = sext <vscale x 2 x i8> %load2.nxv2i8 to <vscale x 2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %sext.nxv2i8to64 = sext <vscale x 2 x i8> %load2.nxv2i8 to <vscale x 2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext.nxv8i16to32 = sext <vscale x 8 x i16> %load2.nxv8i16 to <vscale x 8 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext.nxv8i16to32 = sext <vscale x 8 x i16> %load2.nxv8i16 to <vscale x 8 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv8i16.2 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv8i16.2 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %sext.nxv8i16to64 = sext <vscale x 8 x i16> %load2.nxv8i16.2 to <vscale x 8 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %sext.nxv8i16to64 = sext <vscale x 8 x i16> %load2.nxv8i16.2 to <vscale x 8 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %sext.nxv4i16to32 = sext <vscale x 4 x i16> %load2.nxv4i16 to <vscale x 4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %sext.nxv4i16to32 = sext <vscale x 4 x i16> %load2.nxv4i16 to <vscale x 4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %sext.nxv2i16to64 = sext <vscale x 2 x i16> %load2.nxv2i16 to <vscale x 2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %sext.nxv2i16to64 = sext <vscale x 2 x i16> %load2.nxv2i16 to <vscale x 2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i32> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv4i32 = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i32> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext.nxv4i32to64 = sext <vscale x 4 x i32> %load2.nxv4i32 to <vscale x 4 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sext.nxv4i32to64 = sext <vscale x 4 x i32> %load2.nxv4i32 to <vscale x 4 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv2i32 = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i32> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv2i32 = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i32> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %sext.nxv2i32to64 = sext <vscale x 2 x i32> %load2.nxv2i32 to <vscale x 2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %sext.nxv2i32to64 = sext <vscale x 2 x i32> %load2.nxv2i32 to <vscale x 2 x i64>

%load.nxv16i8 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)		%load.nxv16i8 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
%zext.nxv16i8to16 = zext <vscale x 16 x i8> %load.nxv16i8 to <vscale x 16 x i16>		%zext.nxv16i8to16 = zext <vscale x 16 x i8> %load.nxv16i8 to <vscale x 16 x i16>
%load.nxv16i8.2 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)		%load.nxv16i8.2 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
%zext.nxv16i8to32 = zext <vscale x 16 x i8> %load.nxv16i8.2 to <vscale x 16 x i32>		%zext.nxv16i8to32 = zext <vscale x 16 x i8> %load.nxv16i8.2 to <vscale x 16 x i32>
%load.nxv16i8.3 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)		%load.nxv16i8.3 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr undef, i32 8, <vscale x 16 x i1> undef, <vscale x 16 x i8> undef)
%zext.nxv16i8to64 = zext <vscale x 16 x i8> %load.nxv16i8.3 to <vscale x 16 x i64>		%zext.nxv16i8to64 = zext <vscale x 16 x i8> %load.nxv16i8.3 to <vscale x 16 x i64>
%load.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)		%load.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)
		dmgreenUnsubmitted Done Reply Inline Actions If I'm reading this correctly, there are tests here for loading smaller types and extending them to legal types, but none for loading legal types and extending them. They might be worth adding. dmgreen: If I'm reading this correctly, there are tests here for loading smaller types and extending…
%zext.nxv8i8to16 = zext <vscale x 8 x i8> %load.nxv8i8 to <vscale x 8 x i16>		%zext.nxv8i8to16 = zext <vscale x 8 x i8> %load.nxv8i8 to <vscale x 8 x i16>
%load.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)		%load.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)
%zext.nxv4i8to32 = zext <vscale x 4 x i8> %load.nxv4i8 to <vscale x 4 x i32>		%zext.nxv4i8to32 = zext <vscale x 4 x i8> %load.nxv4i8 to <vscale x 4 x i32>
%load.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)		%load.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)
%zext.nxv2i8to64 = zext <vscale x 2 x i8> %load.nxv2i8 to <vscale x 2 x i64>		%zext.nxv2i8to64 = zext <vscale x 2 x i8> %load.nxv2i8 to <vscale x 2 x i64>
%load.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)		%load.nxv8i16 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
%zext.nxv8i16to32 = zext <vscale x 8 x i16> %load.nxv8i16 to <vscale x 8 x i32>		%zext.nxv8i16to32 = zext <vscale x 8 x i16> %load.nxv8i16 to <vscale x 8 x i32>
%load.nxv8i16.2 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)		%load.nxv8i16.2 = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i16> undef)
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][CostModel] Make sext/zext free if folded into a masked load
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 515254

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][CostModel] Make sext/zext free if folded into a masked loadClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 515254

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll

[AArch64][CostModel] Make sext/zext free if folded into a masked load
ClosedPublic