Download Raw Diff

Details

Reviewers

sdesmalen
hassnaa-arm
MattDevereau
kmclaughlin
dmgreen

Commits

rGafc2b7db026a: [AArch64][CostModel] Make sext/zext free if folded into a masked load

Summary

The BasicTTIImpl implementation of getCastInstrCost ensures
that the cost of zext/sext is 0 when following a load if we
know the combined extending load is legal. For SVE we can do
the same for masked loads too, since they use exactly the
same underlying instruction.

Diff Detail

Event Timeline

david-arm created this revision.Apr 12 2023, 6:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2023, 6:16 AM

Herald added subscribers: ctetreau, hiraditya, kristof.beyls. · View Herald Transcript

david-arm requested review of this revision.Apr 12 2023, 6:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2023, 6:16 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm added a parent revision: D148122: [NFC][AArch64] Add cost model tests for extending loads.Apr 12 2023, 6:16 AM

david-arm added a reviewer: dmgreen.Apr 12 2023, 6:38 AM

sdesmalen added inline comments.Apr 12 2023, 6:49 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1830	nit: this can be `hasSVEorSME`
1833–1834	When the target has SVE and the CCH is masked, is it worth just calling `getCastInstrCost` again but instead passing `CastContextHint::Normal` ?

Harbormaster completed remote builds in B225048: Diff 512803.Apr 12 2023, 7:07 AM

Address review comments

david-arm marked 2 inline comments as done.Apr 12 2023, 8:23 AM

david-arm added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1833–1834	Good suggestion, thanks @sdesmalen !

sdesmalen accepted this revision.Apr 12 2023, 8:43 AM

This revision is now accepted and ready to land.Apr 12 2023, 8:43 AM

Harbormaster completed remote builds in B225083: Diff 512847.Apr 12 2023, 9:08 AM

dmgreen added inline comments.Apr 12 2023, 10:16 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2131	Is this always true that they are equivalent? For a `nxv8i16 load zext to nxv8i32` (without masking) you can convert it into a pair of extending load (each cost 1, so load+zext costs 2 total). The same can't be done for `nxv8i16 masked_load zext to nxv8i32` without either converting the nxv8i1 mask to two nxv4i1 masks, or zext a single load with a pair of uunpk's. (For MVE both are expensive so we give the instruction a high cost, preferring lower vector factors).
llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
138	If I'm reading this correctly, there are tests here for loading smaller types and extending them to legal types, but none for loading legal types and extending them. They might be worth adding.

Only consider types where the destination is legal.

david-arm marked 2 inline comments as done.Apr 17 2023, 7:53 AM

david-arm added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2131	Good point! I think the new version should fix that.

dmgreen added inline comments.Apr 17 2023, 8:14 AM

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll
111–112	Do you know why this has gone from 3 to 1, if the Dst type is not legal? I think I would expect the score to be 2!

Harbormaster completed remote builds in B226106: Diff 514235.Apr 17 2023, 9:35 AM

Matt added a subscriber: Matt.Apr 17 2023, 9:38 AM

david-arm marked an inline comment as done.Apr 18 2023, 1:01 AM

david-arm added inline comments.

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll

111–112

I assumed this was because it was deciding to create two legal extending loads and then stitch the results together, i.e. reinvoke getCastInstrCost with the type split in half.

%load.nxv88.1 = call <vscale x 16 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)
%zext.nxv16i8to16.1 = zext <vscale x 8 x i8> %load.nxv8i8.1 to <vscale x 8 x i16>
%load.nxv8i8.2 = call <vscale x 16 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)
%zext.nxv16i8to16.2 = zext <vscale x 8 x i8> %load.nxv8i8.2 to <vscale x 8 x i16>
%zext.nxv16i8to16 = concat (%load.nxv8i8.1, %load.nxv8i8.2)

so that the only cost of the zext is just the concat, but it's a good question. I can double check.

david-arm added inline comments.Apr 18 2023, 1:17 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

2131

This is in BasicTTIImpl::getCastInstrCost:

// If we are legalizing by splitting, query the concrete TTI for the cost
// of casting the original vector twice. We also need to factor in the
// cost of the split itself. Count that as 1, to be consistent with
// getTypeLegalizationCost().
bool SplitSrc =
    TLI->getTypeAction(Src->getContext(), TLI->getValueType(DL, Src)) ==
    TargetLowering::TypeSplitVector;
bool SplitDst =
    TLI->getTypeAction(Dst->getContext(), TLI->getValueType(DL, Dst)) ==
    TargetLowering::TypeSplitVector;
if ((SplitSrc || SplitDst) && SrcVTy->getElementCount().isVector() &&
    DstVTy->getElementCount().isVector()) {
  Type *SplitDstTy = VectorType::getHalfElementsVectorType(DstVTy);
  Type *SplitSrcTy = VectorType::getHalfElementsVectorType(SrcVTy);
  T *TTI = static_cast<T *>(this);
  // If both types need to be split then the split is free.
  InstructionCost SplitCost =
      (!SplitSrc || !SplitDst) ? TTI->getVectorSplitCost() : 0;
  return SplitCost +
         (2 * TTI->getCastInstrCost(Opcode, SplitDstTy, SplitSrcTy, CCH,
                                    CostKind, I));
}

which explains what's happening. It splits the types, then recalculates the zext/sext when the dest is a legal type. It just so happens that this becomes a legal extending load, which is correct! So the extend is absorbed into each load and becomes free. The only cost is then the SplitCost, which I guess could account for the additional load required.

That would make sense for normal loads, but masked loads will not split like that (unless they can extend the mask). https://godbolt.org/z/x9T8vP6Kx. It may be simpler to be more precise about the cost if it returned it directly.

In D148123#4277223, @dmgreen wrote:

That would make sense for normal loads, but masked loads will not split like that (unless they can extend the mask). https://godbolt.org/z/x9T8vP6Kx. It may be simpler to be more precise about the cost if it returned it directly.

I see what you mean. At the moment we won't lower to two extending loads, although we could do if we thought it help - it would just require a punpklo and punpkhi instead of sunpklo and sunpkhi. However, I think that extends the scope of this patch beyond what was originally intended, which is to only consider extends to legal types. This patch isn't changing the behaviour of BasicTTIImpl::getCastInstrCost - before and after this patch we are splitting the types in exactly the same way. I can have a look at what's required to be more precise, but I think that's going to require adding many entries adding to the existing table to deal with all the possible combinations. I feel that might be better in a separate patch, rather than complicate this one?

Rebase.

In D148123#4277223, @dmgreen wrote:

That would make sense for normal loads, but masked loads will not split like that (unless they can extend the mask). https://godbolt.org/z/x9T8vP6Kx. It may be simpler to be more precise about the cost if it returned it directly.

Hi @dmgreen , 've rebased this patch now that https://reviews.llvm.org/D142456 has landed, which explicitly accounts for extends to illegal types, so I think your concerns should be addressed now!

Harbormaster completed remote builds in B226585: Diff 514924.Apr 19 2023, 7:22 AM

I see. I am surprised that patch could be committed without causing regressions again.

The way this patch works makes it look like extends of masked costs should be the same as normal loads, I can't say I'm a huge fan of that just because it is not really how they work. Plus you have to try to reason about what the costs will become. So long as there are tests though, that should not be a problem and we can see we are getting the right costs out. The number do now look good.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2130	You can drop the brackets around `CCH == TTI::CastContextHint::Masked`

In D148123#4280685, @dmgreen wrote:

I see. I am surprised that patch could be committed without causing regressions again.

Yeah, @hassnaa-arm was able to reland that because of this patch https://reviews.llvm.org/D147522

This revision was landed with ongoing or failed builds.Apr 20 2023, 1:49 AM

Closed by commit rGafc2b7db026a: [AArch64][CostModel] Make sext/zext free if folded into a masked load (authored by david-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rGafc2b7db026a: [AArch64][CostModel] Make sext/zext free if folded into a masked load.

Diff 512847

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,821 Lines • ▼ Show 20 Lines	InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
}		}

// TODO: Allow non-throughput costs that aren't binary.		// TODO: Allow non-throughput costs that aren't binary.
auto AdjustCost = [&CostKind](InstructionCost Cost) -> InstructionCost {		auto AdjustCost = [&CostKind](InstructionCost Cost) -> InstructionCost {
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return Cost == 0 ? 0 : 1;		return Cost == 0 ? 0 : 1;
return Cost;		return Cost;
};		};

		sdesmalenUnsubmitted Done Reply Inline Actions nit: this can be `hasSVEorSME` sdesmalen: nit: this can be `hasSVEorSME`
EVT SrcTy = TLI->getValueType(DL, Src);		EVT SrcTy = TLI->getValueType(DL, Src);
EVT DstTy = TLI->getValueType(DL, Dst);		EVT DstTy = TLI->getValueType(DL, Dst);

if (!SrcTy.isSimple() \|\| !DstTy.isSimple())		if (!SrcTy.isSimple() \|\| !DstTy.isSimple())
		sdesmalenUnsubmitted Done Reply Inline Actions When the target has SVE and the CCH is masked, is it worth just calling `getCastInstrCost` again but instead passing `CastContextHint::Normal` ? sdesmalen: When the target has SVE and the CCH is masked, is it worth just calling `getCastInstrCost`…
		david-armAuthorUnsubmitted Done Reply Inline Actions Good suggestion, thanks @sdesmalen ! david-arm: Good suggestion, thanks @sdesmalen !
return AdjustCost(		return AdjustCost(
BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));		BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));

static const TypeConversionCostTblEntry		static const TypeConversionCostTblEntry
ConversionTbl[] = {		ConversionTbl[] = {
{ ISD::TRUNCATE, MVT::v2i8, MVT::v2i64, 1}, // xtn		{ ISD::TRUNCATE, MVT::v2i8, MVT::v2i64, 1}, // xtn
{ ISD::TRUNCATE, MVT::v2i16, MVT::v2i64, 1}, // xtn		{ ISD::TRUNCATE, MVT::v2i16, MVT::v2i64, 1}, // xtn
{ ISD::TRUNCATE, MVT::v2i32, MVT::v2i64, 1}, // xtn		{ ISD::TRUNCATE, MVT::v2i32, MVT::v2i64, 1}, // xtn
▲ Show 20 Lines • Show All 276 Lines • ▼ Show 20 Lines	static const TypeConversionCostTblEntry FP16Tbl[] = {
{ISD::SINT_TO_FP, MVT::v16f16, MVT::v16i8, 4}, // 2 * sshl(2) + 2 * scvtf		{ISD::SINT_TO_FP, MVT::v16f16, MVT::v16i8, 4}, // 2 * sshl(2) + 2 * scvtf
};		};

if (ST->hasFullFP16())		if (ST->hasFullFP16())
if (const auto *Entry = ConvertCostTableLookup(		if (const auto *Entry = ConvertCostTableLookup(
FP16Tbl, ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT()))		FP16Tbl, ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT()))
return AdjustCost(Entry->Cost);		return AdjustCost(Entry->Cost);

		// The BasicTTIImpl version only deals with CCH==TTI::CastContextHint::Normal,
		// but we also want to include the TTI::CastContextHint::Masked case too.
		if ((ISD == ISD::ZERO_EXTEND \|\| ISD == ISD::SIGN_EXTEND) &&
		(CCH == TTI::CastContextHint::Masked) && ST->hasSVEorSME())
		dmgreenUnsubmitted Not Done Reply Inline Actions You can drop the brackets around `CCH == TTI::CastContextHint::Masked` dmgreen: You can drop the brackets around `CCH == TTI::CastContextHint::Masked`
		CCH = TTI::CastContextHint::Normal;
		dmgreenUnsubmitted Done Reply Inline Actions Is this always true that they are equivalent? For a `nxv8i16 load zext to nxv8i32` (without masking) you can convert it into a pair of extending load (each cost 1, so load+zext costs 2 total). The same can't be done for `nxv8i16 masked_load zext to nxv8i32` without either converting the nxv8i1 mask to two nxv4i1 masks, or zext a single load with a pair of uunpk's. (For MVE both are expensive so we give the instruction a high cost, preferring lower vector factors). dmgreen: Is this always true that they are equivalent? For a `nxv8i16 load zext to nxv8i32` (without…
		david-armAuthorUnsubmitted Done Reply Inline Actions Good point! I think the new version should fix that. david-arm: Good point! I think the new version should fix that.
		david-armAuthorUnsubmitted Done Reply Inline Actions This is in BasicTTIImpl::getCastInstrCost: // If we are legalizing by splitting, query the concrete TTI for the cost // of casting the original vector twice. We also need to factor in the // cost of the split itself. Count that as 1, to be consistent with // getTypeLegalizationCost(). bool SplitSrc = TLI->getTypeAction(Src->getContext(), TLI->getValueType(DL, Src)) == TargetLowering::TypeSplitVector; bool SplitDst = TLI->getTypeAction(Dst->getContext(), TLI->getValueType(DL, Dst)) == TargetLowering::TypeSplitVector; if ((SplitSrc \|\| SplitDst) && SrcVTy->getElementCount().isVector() && DstVTy->getElementCount().isVector()) { Type SplitDstTy = VectorType::getHalfElementsVectorType(DstVTy); Type SplitSrcTy = VectorType::getHalfElementsVectorType(SrcVTy); T TTI = static_cast<T >(this); // If both types need to be split then the split is free. InstructionCost SplitCost = (!SplitSrc \|\| !SplitDst) ? TTI->getVectorSplitCost() : 0; return SplitCost + (2 * TTI->getCastInstrCost(Opcode, SplitDstTy, SplitSrcTy, CCH, CostKind, I)); } which explains what's happening. It splits the types, then recalculates the zext/sext when the dest is a legal type. It just so happens that this becomes a legal extending load, which is correct! So the extend is absorbed into each load and becomes free. The only cost is then the `SplitCost`, which I guess could account for the additional load required. david-arm: This is in BasicTTIImpl::getCastInstrCost: ``` // If we are legalizing by splitting…

return AdjustCost(		return AdjustCost(
BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));		BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));
}		}

InstructionCost AArch64TTIImpl::getExtractWithExtendCost(unsigned Opcode,		InstructionCost AArch64TTIImpl::getExtractWithExtendCost(unsigned Opcode,
Type *Dst,		Type *Dst,
VectorType *VecTy,		VectorType *VecTy,
unsigned Index) {		unsigned Index) {
▲ Show 20 Lines • Show All 1,328 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	entry:
%nxv4i64 = call <vscale x 4 x i64> @llvm.masked.load.nxv4i64.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i64> undef)		%nxv4i64 = call <vscale x 4 x i64> @llvm.masked.load.nxv4i64.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i64> undef)
%nxv32f16 = call <vscale x 32 x half> @llvm.masked.load.nxv32f16.p0(ptr undef, i32 8, <vscale x 32 x i1> undef, <vscale x 32 x half> undef)		%nxv32f16 = call <vscale x 32 x half> @llvm.masked.load.nxv32f16.p0(ptr undef, i32 8, <vscale x 32 x i1> undef, <vscale x 32 x half> undef)

ret void		ret void
}		}


define void @scalable_ext_loads() {		define void @scalable_ext_loads() {
; CHECK-LABEL: 'scalable_ext_loads'		; CHECK-LABEL: 'scalable_ext_loads'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)
		dmgreenUnsubmitted Not Done Reply Inline Actions Do you know why this has gone from 3 to 1, if the Dst type is not legal? I think I would expect the score to be 2! dmgreen: Do you know why this has gone from 3 to 1, if the Dst type is not legal? I think I would…
		david-armAuthorUnsubmitted Done Reply Inline Actions I assumed this was because it was deciding to create two legal extending loads and then stitch the results together, i.e. reinvoke getCastInstrCost with the type split in half. %load.nxv88.1 = call <vscale x 16 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef) %zext.nxv16i8to16.1 = zext <vscale x 8 x i8> %load.nxv8i8.1 to <vscale x 8 x i16> %load.nxv8i8.2 = call <vscale x 16 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef) %zext.nxv16i8to16.2 = zext <vscale x 8 x i8> %load.nxv8i8.2 to <vscale x 8 x i16> %zext.nxv16i8to16 = concat (%load.nxv8i8.1, %load.nxv8i8.2) so that the only cost of the zext is just the concat, but it's a good question. I can double check. david-arm: I assumed this was because it was deciding to create two legal extending loads and then stitch…
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %zext.nxv8i8to16 = zext <vscale x 8 x i8> %load.nxv8i8 to <vscale x 8 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %zext.nxv8i8to16 = zext <vscale x 8 x i8> %load.nxv8i8 to <vscale x 8 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %zext.nxv4i8to32 = zext <vscale x 4 x i8> %load.nxv4i8 to <vscale x 4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %zext.nxv4i8to32 = zext <vscale x 4 x i8> %load.nxv4i8 to <vscale x 4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %zext.nxv2i8to64 = zext <vscale x 2 x i8> %load.nxv2i8 to <vscale x 2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %zext.nxv2i8to64 = zext <vscale x 2 x i8> %load.nxv2i8 to <vscale x 2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %zext.nxv4i16to32 = zext <vscale x 4 x i16> %load.nxv4i16 to <vscale x 4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %zext.nxv4i16to32 = zext <vscale x 4 x i16> %load.nxv4i16 to <vscale x 4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %zext.nxv2i16to64 = zext <vscale x 2 x i16> %load.nxv2i16 to <vscale x 2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %zext.nxv2i16to64 = zext <vscale x 2 x i16> %load.nxv2i16 to <vscale x 2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv2i32 = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i32> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load.nxv2i32 = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i32> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %zext.nxv2i32to64 = zext <vscale x 2 x i32> %load.nxv2i32 to <vscale x 2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %zext.nxv2i32to64 = zext <vscale x 2 x i32> %load.nxv2i32 to <vscale x 2 x i64>

; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %sext.nxv8i8to16 = sext <vscale x 8 x i8> %load2.nxv8i8 to <vscale x 8 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %sext.nxv8i8to16 = sext <vscale x 8 x i8> %load2.nxv8i8 to <vscale x 8 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %sext.nxv4i8to32 = sext <vscale x 4 x i8> %load2.nxv4i8 to <vscale x 4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %sext.nxv4i8to32 = sext <vscale x 4 x i8> %load2.nxv4i8 to <vscale x 4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %sext.nxv2i8to64 = sext <vscale x 2 x i8> %load2.nxv2i8 to <vscale x 2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %sext.nxv2i8to64 = sext <vscale x 2 x i8> %load2.nxv2i8 to <vscale x 2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %sext.nxv4i16to32 = sext <vscale x 4 x i16> %load2.nxv4i16 to <vscale x 4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %sext.nxv4i16to32 = sext <vscale x 4 x i16> %load2.nxv4i16 to <vscale x 4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %sext.nxv2i16to64 = sext <vscale x 2 x i16> %load2.nxv2i16 to <vscale x 2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %sext.nxv2i16to64 = sext <vscale x 2 x i16> %load2.nxv2i16 to <vscale x 2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv2i32 = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i32> undef)		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %load2.nxv2i32 = call <vscale x 2 x i32> @llvm.masked.load.nxv2i32.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i32> undef)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %sext.nxv2i32to64 = sext <vscale x 2 x i32> %load2.nxv2i32 to <vscale x 2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %sext.nxv2i32to64 = sext <vscale x 2 x i32> %load2.nxv2i32 to <vscale x 2 x i64>

%load.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)		%load.nxv8i8 = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef)
		dmgreenUnsubmitted Done Reply Inline Actions If I'm reading this correctly, there are tests here for loading smaller types and extending them to legal types, but none for loading legal types and extending them. They might be worth adding. dmgreen: If I'm reading this correctly, there are tests here for loading smaller types and extending…
%zext.nxv8i8to16 = zext <vscale x 8 x i8> %load.nxv8i8 to <vscale x 8 x i16>		%zext.nxv8i8to16 = zext <vscale x 8 x i8> %load.nxv8i8 to <vscale x 8 x i16>
%load.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)		%load.nxv4i8 = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i8> undef)
%zext.nxv4i8to32 = zext <vscale x 4 x i8> %load.nxv4i8 to <vscale x 4 x i32>		%zext.nxv4i8to32 = zext <vscale x 4 x i8> %load.nxv4i8 to <vscale x 4 x i32>
%load.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)		%load.nxv2i8 = call <vscale x 2 x i8> @llvm.masked.load.nxv2i8.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i8> undef)
%zext.nxv2i8to64 = zext <vscale x 2 x i8> %load.nxv2i8 to <vscale x 2 x i64>		%zext.nxv2i8to64 = zext <vscale x 2 x i8> %load.nxv2i8 to <vscale x 2 x i64>
%load.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)		%load.nxv4i16 = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr undef, i32 8, <vscale x 4 x i1> undef, <vscale x 4 x i16> undef)
%zext.nxv4i16to32 = zext <vscale x 4 x i16> %load.nxv4i16 to <vscale x 4 x i32>		%zext.nxv4i16to32 = zext <vscale x 4 x i16> %load.nxv4i16 to <vscale x 4 x i32>
%load.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)		%load.nxv2i16 = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16.p0(ptr undef, i32 8, <vscale x 2 x i1> undef, <vscale x 2 x i16> undef)
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][CostModel] Make sext/zext free if folded into a masked load
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 512847

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][CostModel] Make sext/zext free if folded into a masked loadClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 512847

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll

[AArch64][CostModel] Make sext/zext free if folded into a masked load
ClosedPublic