This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/
-
lib/
-
Target/AArch64/
-
AArch64/
2/5
AArch64ISelLowering.cpp
2/4
AArch64TargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorize.cpp

Differential D136153

[AArch64] Allow cost computation for interleaved accesses
AbandonedPublic

Authored by mgabka on Oct 18 2022, 3:16 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
huntergr
david-arm
igor.kirillov
bsmith

Summary

This commit is moving the rejection of scalable
  vectors in the loop vectorization up to the target specific cost model.
  Before this change for interleaved memory accesses there was an early
  rejection from inside the loop vectorizer cost model.

  The BasicTTIImplBase returns invalid cost for scalable vectors
  what should be enough for other backends to use.

  AArch64 will still return invalid cost for scalable vectors
  for interleaved accesses as more work is required to enable it.

Diff Detail

Event Timeline

mgabka created this revision.Oct 18 2022, 3:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2022, 3:16 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

mgabka requested review of this revision.Oct 18 2022, 3:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2022, 3:16 AM

Herald added subscribers: llvm-commits, • pcwang-thead, alextsao1999. · View Herald Transcript

Harbormaster completed remote builds in B192696: Diff 468471.Oct 18 2022, 4:05 AM

Looks like lack of vscale_range attribute was causing div by 0 exception.

fix not closed attributes declaration in a test file

Harbormaster completed remote builds in B192758: Diff 468552.Oct 18 2022, 9:13 AM

mgabka added reviewers: paulwalker-arm, huntergr, david-arm, igor.kirillov.Oct 18 2022, 9:15 AM

mgabka added a reviewer: bsmith.Oct 27 2022, 2:01 AM

paulwalker-arm added inline comments.Nov 16 2022, 8:44 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14033–14034	The usage of `getMinSVEVectorSizeInBits` here is only relevant for fixed length vector types. When working with scalable types we already know the size of the legal types (i.e. vscale * 128) and so the `: 128` part should be fine because `Scalable_EC/(vscale * 128)` ==> `EC.getKnownMinValue() / 128`). Fixing this should mean you no longer require `vscale_range` attributes for your scalable vector tests.
14048–14049	Is this correct? Do you really want to say all scalable vector types are legal?
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2489–2490	This is worth a comment because the naive optimisation is to simply move the check higher up whereas you deliberately want to exercise as much of the costing code as possible, even though we'll currently ignore the result.
2496–2498	Perhaps simplify to just "All other uses of scalable vectors are not legal."? With that said, looking at the base implementation of getInterleavedMemoryOpCost suggests it does the right thing so you could remove this code and just fall into the base version as before.
llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
1–6 ↗	(On Diff #468552)	Does the code affect the output of these tests? Perhaps in this instance it is better to commit the tests first in isolation and then it's clear to see what effect this patch has (or rather, to show this patch does not change the existing behaviour).

mgabka added inline comments.Dec 1 2022, 7:24 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14033–14034	ok, I will simplify it
14048–14049	my understanding is that at the moment we do not know which scalable vector types are legal or not for this type of accesses, it depends how this is going to be implemented, right? getInterleavedMemoryOpCost is just calling isLegalInterleavedAccessType, but later there is a check and returns invalid cost for all scalable VectorTypes. I probably should add a TODO comment here. I thought that this might be enough for now, we will need to add some extra tests later to check if the right types are legal or not. At the moment the isLegalInterleavedAccessType is only used for the fixed width SVE so I think I should add extra assert here as well. I might also try to check the scalable types here assuming legal element sizes etc, but at the moment we aren't able to test it.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2489–2490	ok will add a comment, fair point
2496–2498	yeah you are right, it has an early check at the beginning, I will change it
llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
1–6 ↗	(On Diff #468552)	My patch does not affect the output for this test. This test is also a copy of llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll but uses aarch64 target triple and sve feature. Is it ok to do it like that? I wanted to demonstrate that despite removing the early exit from LoopVectorizationCostModel::getInterleaveGroupCost for scalable VF we still do not attempt to vectorize interleaved groups for AArch64. In that instance yes It might be worth to pre commit this test.

paulwalker-arm added inline comments.Dec 6 2022, 8:16 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14048–14049	This is target specific code so we can rely on implicit knowledge. For NEON you'll see the function only accepts the typical element types and then ends with `Ensure the total vector size is 64 or a multiple of 128`. So it seems reasonable to expect similar for scalable vectors.
llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
1–6 ↗	(On Diff #468552)	Yep, by pre committing the test is becomes trivial to prove the patch works as expected by not changing the current behaviour.

Matt added a subscriber: Matt.Dec 7 2022, 6:13 PM

addressed review comments

mgabka added a parent revision: D140003: [AArch64][NFC] Precommit tests for SVE interleaved accesses cost computation.Dec 14 2022, 2:26 AM

Harbormaster completed remote builds in B203060: Diff 482769.Dec 14 2022, 4:05 AM

replaced by https://reviews.llvm.org/D145163

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

23 lines

AArch64TargetTransformInfo.cpp

18 lines

Transforms/

Vectorize/

LoopVectorize.cpp

5 lines

Diff 482769

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,018 Lines • ▼ Show 20 Lines	bool AArch64TargetLowering::hasPairedLoad(EVT LoadedType,
return NumBits == 32 \|\| NumBits == 64;		return NumBits == 32 \|\| NumBits == 64;
}		}

/// A helper function for determining the number of interleaved accesses we		/// A helper function for determining the number of interleaved accesses we
/// will generate when lowering accesses of the given type.		/// will generate when lowering accesses of the given type.
unsigned AArch64TargetLowering::getNumInterleavedAccesses(		unsigned AArch64TargetLowering::getNumInterleavedAccesses(
VectorType *VecTy, const DataLayout &DL, bool UseScalable) const {		VectorType *VecTy, const DataLayout &DL, bool UseScalable) const {
unsigned VecSize = 128;		unsigned VecSize = 128;
		unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());
		auto EC = VecTy->getElementCount();
if (UseScalable)		if (UseScalable)
VecSize = std::max(Subtarget->getMinSVEVectorSizeInBits(), 128u);		VecSize = std::max(Subtarget->getMinSVEVectorSizeInBits(), 128u);
return std::max<unsigned>(1, (DL.getTypeSizeInBits(VecTy) + 127) / VecSize);		return std::max<unsigned>(1,
		(EC.getKnownMinValue() * ElSize + 127) / VecSize);
}		}

		paulwalker-armUnsubmitted Not Done Reply Inline Actions The usage of `getMinSVEVectorSizeInBits` here is only relevant for fixed length vector types. When working with scalable types we already know the size of the legal types (i.e. vscale * 128) and so the `: 128` part should be fine because `Scalable_EC/(vscale * 128)` ==> `EC.getKnownMinValue() / 128`). Fixing this should mean you no longer require `vscale_range` attributes for your scalable vector tests. paulwalker-arm: The usage of `getMinSVEVectorSizeInBits` here is only relevant for fixed length vector types.
		mgabkaAuthorUnsubmitted Done Reply Inline Actions ok, I will simplify it mgabka: ok, I will simplify it
MachineMemOperand::Flags		MachineMemOperand::Flags
AArch64TargetLowering::getTargetMMOFlags(const Instruction &I) const {		AArch64TargetLowering::getTargetMMOFlags(const Instruction &I) const {
if (Subtarget->getProcFamily() == AArch64Subtarget::Falkor &&		if (Subtarget->getProcFamily() == AArch64Subtarget::Falkor &&
I.getMetadata(FALKOR_STRIDED_ACCESS_MD) != nullptr)		I.getMetadata(FALKOR_STRIDED_ACCESS_MD) != nullptr)
return MOStridedAccess;		return MOStridedAccess;
return MachineMemOperand::MONone;		return MachineMemOperand::MONone;
}		}

bool AArch64TargetLowering::isLegalInterleavedAccessType(		bool AArch64TargetLowering::isLegalInterleavedAccessType(
VectorType *VecTy, const DataLayout &DL, bool &UseScalable) const {		VectorType *VecTy, const DataLayout &DL, bool &UseScalable) const {

unsigned VecSize = DL.getTypeSizeInBits(VecTy);
unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());		unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());
unsigned NumElements = cast<FixedVectorType>(VecTy)->getNumElements();		auto EC = VecTy->getElementCount();

UseScalable = false;		UseScalable = false;

		paulwalker-armUnsubmitted Not Done Reply Inline Actions Is this correct? Do you really want to say all scalable vector types are legal? paulwalker-arm: Is this correct? Do you really want to say all scalable vector types are legal?
		mgabkaAuthorUnsubmitted Done Reply Inline Actions my understanding is that at the moment we do not know which scalable vector types are legal or not for this type of accesses, it depends how this is going to be implemented, right? getInterleavedMemoryOpCost is just calling isLegalInterleavedAccessType, but later there is a check and returns invalid cost for all scalable VectorTypes. I probably should add a TODO comment here. I thought that this might be enough for now, we will need to add some extra tests later to check if the right types are legal or not. At the moment the isLegalInterleavedAccessType is only used for the fixed width SVE so I think I should add extra assert here as well. I might also try to check the scalable types here assuming legal element sizes etc, but at the moment we aren't able to test it. mgabka: my understanding is that at the moment we do not know which scalable vector types are legal or…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions This is target specific code so we can rely on implicit knowledge. For NEON you'll see the function only accepts the typical element types and then ends with `Ensure the total vector size is 64 or a multiple of 128`. So it seems reasonable to expect similar for scalable vectors. paulwalker-arm: This is target specific code so we can rely on implicit knowledge. For NEON you'll see the…
// Ensure that the predicate for this number of elements is available.		// Ensure that the predicate for this number of elements is available.
if (Subtarget->hasSVE() && !getSVEPredPatternFromNumElements(NumElements))		if (Subtarget->hasSVE() &&
		!getSVEPredPatternFromNumElements(EC.getKnownMinValue()))
return false;		return false;

// Ensure the number of vector elements is greater than 1.		// Ensure the number of vector elements is greater than 1.
if (NumElements < 2)		if (EC.getKnownMinValue() < 2)
return false;		return false;

// Ensure the element type is legal.		// Ensure the element type is legal.
if (ElSize != 8 && ElSize != 16 && ElSize != 32 && ElSize != 64)		if (ElSize != 8 && ElSize != 16 && ElSize != 32 && ElSize != 64)
return false;		return false;

		if (EC.isScalable()) {
		if (EC.getKnownMinValue() * ElSize == 128)
		return true;
		return false;
		}

		unsigned VecSize = DL.getTypeSizeInBits(VecTy);
if (Subtarget->forceStreamingCompatibleSVE() \|\|		if (Subtarget->forceStreamingCompatibleSVE() \|\|
(Subtarget->useSVEForFixedLengthVectors() &&		(Subtarget->useSVEForFixedLengthVectors() &&
(VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 \|\|		(VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 \|\|
(VecSize < Subtarget->getMinSVEVectorSizeInBits() &&		(VecSize < Subtarget->getMinSVEVectorSizeInBits() &&
isPowerOf2_32(NumElements) && VecSize > 128)))) {		isPowerOf2_32(EC.getKnownMinValue()) && VecSize > 128)))) {
UseScalable = true;		UseScalable = true;
return true;		return true;
}		}

// Ensure the total vector size is 64 or a multiple of 128. Types larger than		// Ensure the total vector size is 64 or a multiple of 128. Types larger than
// 128 will be split into multiple interleaved accesses.		// 128 will be split into multiple interleaved accesses.
return VecSize == 64 \|\| VecSize % 128 == 0;		return VecSize == 64 \|\| VecSize % 128 == 0;
}		}
▲ Show 20 Lines • Show All 9,499 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 2,463 Lines • ▼ Show 20 Lines	InstructionCost AArch64TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Ty,
return LT.first;		return LT.first;
}		}

InstructionCost AArch64TTIImpl::getInterleavedMemoryOpCost(		InstructionCost AArch64TTIImpl::getInterleavedMemoryOpCost(
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,		Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
bool UseMaskForCond, bool UseMaskForGaps) {		bool UseMaskForCond, bool UseMaskForGaps) {
assert(Factor >= 2 && "Invalid interleave factor");		assert(Factor >= 2 && "Invalid interleave factor");
auto *VecVTy = cast<FixedVectorType>(VecTy);		auto *VecVTy = cast<VectorType>(VecTy);

if (!UseMaskForCond && !UseMaskForGaps &&		if (!UseMaskForCond && !UseMaskForGaps &&
Factor <= TLI->getMaxSupportedInterleaveFactor()) {		Factor <= TLI->getMaxSupportedInterleaveFactor()) {
unsigned NumElts = VecVTy->getNumElements();		unsigned NumElts = VecVTy->getElementCount().getKnownMinValue();
auto *SubVecTy =		auto *SubVecTy =
FixedVectorType::get(VecTy->getScalarType(), NumElts / Factor);		VectorType::get(VecVTy->getElementType(),
		VecVTy->getElementCount().divideCoefficientBy(Factor));

// ldN/stN only support legal vector types of size 64 or 128 in bits.		// ldN/stN only support legal vector types of size 64 or 128 in bits.
// Accesses having vector types that are a multiple of 128 bits can be		// Accesses having vector types that are a multiple of 128 bits can be
// matched to more than one ldN/stN instruction.		// matched to more than one ldN/stN instruction.
bool UseScalable;		bool UseScalable;
if (NumElts % Factor == 0 &&		if (NumElts % Factor == 0 &&
TLI->isLegalInterleavedAccessType(SubVecTy, DL, UseScalable))		TLI->isLegalInterleavedAccessType(SubVecTy, DL, UseScalable)) {
return Factor * TLI->getNumInterleavedAccesses(SubVecTy, DL, UseScalable);		unsigned Cost =
		Factor * TLI->getNumInterleavedAccesses(SubVecTy, DL, UseScalable);
		// Deliberately do not move it to the begining of this function
		// as we want to execute as much as possible code for scalable vectors.
		paulwalker-armUnsubmitted Not Done Reply Inline Actions This is worth a comment because the naive optimisation is to simply move the check higher up whereas you deliberately want to exercise as much of the costing code as possible, even though we'll currently ignore the result. paulwalker-arm: This is worth a comment because the naive optimisation is to simply move the check higher up…
		mgabkaAuthorUnsubmitted Done Reply Inline Actions ok will add a comment, fair point mgabka: ok will add a comment, fair point
		if (isa<ScalableVectorType>(VecTy))
		return InstructionCost::getInvalid();
		return Cost;
		}
}		}

return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,		return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace, CostKind,		Alignment, AddressSpace, CostKind,
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Perhaps simplify to just "All other uses of scalable vectors are not legal."? With that said, looking at the base implementation of getInterleavedMemoryOpCost suggests it does the right thing so you could remove this code and just fall into the base version as before. paulwalker-arm: Perhaps simplify to just "All other uses of scalable vectors are not legal."? With that said…
		mgabkaAuthorUnsubmitted Done Reply Inline Actions yeah you are right, it has an early check at the beginning, I will change it mgabka: yeah you are right, it has an early check at the beginning, I will change it
UseMaskForCond, UseMaskForGaps);		UseMaskForCond, UseMaskForGaps);
}		}

InstructionCost		InstructionCost
AArch64TTIImpl::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) {		AArch64TTIImpl::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) {
InstructionCost Cost = 0;		InstructionCost Cost = 0;
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
for (auto *I : Tys) {		for (auto *I : Tys) {
▲ Show 20 Lines • Show All 748 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,510 Lines • ▼ Show 20 Lines	return TTI.getAddressComputationCost(VectorTy) +
TTI.getGatherScatterOpCost(		TTI.getGatherScatterOpCost(
I->getOpcode(), VectorTy, Ptr, Legal->isMaskRequired(I), Alignment,		I->getOpcode(), VectorTy, Ptr, Legal->isMaskRequired(I), Alignment,
TargetTransformInfo::TCK_RecipThroughput, I);		TargetTransformInfo::TCK_RecipThroughput, I);
}		}

InstructionCost		InstructionCost
LoopVectorizationCostModel::getInterleaveGroupCost(Instruction *I,		LoopVectorizationCostModel::getInterleaveGroupCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
// TODO: Once we have support for interleaving with scalable vectors
// we can calculate the cost properly here.
if (VF.isScalable())
return InstructionCost::getInvalid();

Type *ValTy = getLoadStoreType(I);		Type *ValTy = getLoadStoreType(I);
auto *VectorTy = cast<VectorType>(ToVectorTy(ValTy, VF));		auto *VectorTy = cast<VectorType>(ToVectorTy(ValTy, VF));
unsigned AS = getLoadStoreAddressSpace(I);		unsigned AS = getLoadStoreAddressSpace(I);
enum TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;		enum TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

auto Group = getInterleavedAccessGroup(I);		auto Group = getInterleavedAccessGroup(I);
assert(Group && "Fail to get an interleaved access group.");		assert(Group && "Fail to get an interleaved access group.");

▲ Show 20 Lines • Show All 4,156 Lines • Show Last 20 Lines