The BasicTTIImpl implementation of getCastInstrCost ensures
that the cost of zext/sext is 0 when following a load if we
know the combined extending load is legal. For SVE we can do
the same for masked loads too, since they use exactly the
same underlying instruction.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1837–1838 | Good suggestion, thanks @sdesmalen ! |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2152 | Is this always true that they are equivalent? For a nxv8i16 load zext to nxv8i32 (without masking) you can convert it into a pair of extending load (each cost 1, so load+zext costs 2 total). The same can't be done for nxv8i16 masked_load zext to nxv8i32 without either converting the nxv8i1 mask to two nxv4i1 masks, or zext a single load with a pair of uunpk's. (For MVE both are expensive so we give the instruction a high cost, preferring lower vector factors). | |
llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll | ||
168 | If I'm reading this correctly, there are tests here for loading smaller types and extending them to legal types, but none for loading legal types and extending them. They might be worth adding. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2152 | Good point! I think the new version should fix that. |
llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll | ||
---|---|---|
113–117 | Do you know why this has gone from 3 to 1, if the Dst type is not legal? I think I would expect the score to be 2! |
llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll | ||
---|---|---|
113–117 | I assumed this was because it was deciding to create two legal extending loads and then stitch the results together, i.e. reinvoke getCastInstrCost with the type split in half. %load.nxv88.1 = call <vscale x 16 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef) %zext.nxv16i8to16.1 = zext <vscale x 8 x i8> %load.nxv8i8.1 to <vscale x 8 x i16> %load.nxv8i8.2 = call <vscale x 16 x i8> @llvm.masked.load.nxv8i8.p0(ptr undef, i32 8, <vscale x 8 x i1> undef, <vscale x 8 x i8> undef) %zext.nxv16i8to16.2 = zext <vscale x 8 x i8> %load.nxv8i8.2 to <vscale x 8 x i16> %zext.nxv16i8to16 = concat (%load.nxv8i8.1, %load.nxv8i8.2) so that the only cost of the zext is just the concat, but it's a good question. I can double check. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2152 | This is in BasicTTIImpl::getCastInstrCost: // If we are legalizing by splitting, query the concrete TTI for the cost // of casting the original vector twice. We also need to factor in the // cost of the split itself. Count that as 1, to be consistent with // getTypeLegalizationCost(). bool SplitSrc = TLI->getTypeAction(Src->getContext(), TLI->getValueType(DL, Src)) == TargetLowering::TypeSplitVector; bool SplitDst = TLI->getTypeAction(Dst->getContext(), TLI->getValueType(DL, Dst)) == TargetLowering::TypeSplitVector; if ((SplitSrc || SplitDst) && SrcVTy->getElementCount().isVector() && DstVTy->getElementCount().isVector()) { Type *SplitDstTy = VectorType::getHalfElementsVectorType(DstVTy); Type *SplitSrcTy = VectorType::getHalfElementsVectorType(SrcVTy); T *TTI = static_cast<T *>(this); // If both types need to be split then the split is free. InstructionCost SplitCost = (!SplitSrc || !SplitDst) ? TTI->getVectorSplitCost() : 0; return SplitCost + (2 * TTI->getCastInstrCost(Opcode, SplitDstTy, SplitSrcTy, CCH, CostKind, I)); } which explains what's happening. It splits the types, then recalculates the zext/sext when the dest is a legal type. It just so happens that this becomes a legal extending load, which is correct! So the extend is absorbed into each load and becomes free. The only cost is then the SplitCost, which I guess could account for the additional load required. |
That would make sense for normal loads, but masked loads will not split like that (unless they can extend the mask). https://godbolt.org/z/x9T8vP6Kx. It may be simpler to be more precise about the cost if it returned it directly.
I see what you mean. At the moment we won't lower to two extending loads, although we could do if we thought it help - it would just require a punpklo and punpkhi instead of sunpklo and sunpkhi. However, I think that extends the scope of this patch beyond what was originally intended, which is to only consider extends to legal types. This patch isn't changing the behaviour of BasicTTIImpl::getCastInstrCost - before and after this patch we are splitting the types in exactly the same way. I can have a look at what's required to be more precise, but I think that's going to require adding many entries adding to the existing table to deal with all the possible combinations. I feel that might be better in a separate patch, rather than complicate this one?
Hi @dmgreen , 've rebased this patch now that https://reviews.llvm.org/D142456 has landed, which explicitly accounts for extends to illegal types, so I think your concerns should be addressed now!
I see. I am surprised that patch could be committed without causing regressions again.
The way this patch works makes it look like extends of masked costs should be the same as normal loads, I can't say I'm a huge fan of that just because it is not really how they work. Plus you have to try to reason about what the costs will become. So long as there are tests though, that should not be a problem and we can see we are getting the right costs out. The number do now look good.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2151 | You can drop the brackets around CCH == TTI::CastContextHint::Masked |
Yeah, @hassnaa-arm was able to reland that because of this patch https://reviews.llvm.org/D147522
nit: this can be hasSVEorSME