This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
sve-inductions-unusual-types.ll

Differential D113772

[Analysis] Fix getNumberOfParts to return 0 when the answer is unknown
ClosedPublic

Authored by david-arm on Nov 12 2021, 6:57 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
kmclaughlin
fhahn
simoll
peterwaller-arm

Commits

rG670dd402441f: [Analysis] Fix getNumberOfParts to return 0 when the answer is unknown

Summary

When asking how many parts are required for a scalable vector type
there are occasions when it cannot be computed. For example, <vscale x 1 x i3>
is one such vector for AArch64+SVE because at the moment no matter how we
promote the i3 type we never end up with a legal vector. This means
that getTypeConversion returns TypeScalarizeScalableVector as the
LegalizeKind, and then getTypeLegalizationCost returns an invalid cost.
This then causes BasicTTImpl::getNumberOfParts to dereference an invalid
cost, which triggers an assert. This patch changes getNumberOfParts to
return 0 for such cases, since the definition of getNumberOfParts in
TargetTransformInfo.h states that we can use a return value of 0 to represent
an unknown answer.

Currently, LoopVectorize.cpp is the only place where we need to check for
0 as a return value, because all other instances will not currently
ask for the number of parts for <vscale x 1 x iX> types.

In addition, I have changed the target-independent interface for
getNumberOfParts to return 1 and assume there is a single register
that can fit the type. The loop vectoriser has lots of tests that are
target-independent and they relied upon the 0 value to mean the
answer is known and that we are not scalarising the vector.

I have added tests here that show we correctly return an invalid cost
for VF=vscale x 1 when the loop contains unusual types such as i7:

Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Nov 12 2021, 6:57 AM

Herald added subscribers: ctetreau, CarolineConcatto, hiraditya, kristof.beyls. · View Herald TranscriptNov 12 2021, 6:57 AM

david-arm requested review of this revision.Nov 12 2021, 6:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2021, 6:57 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

An alternative solution to changing llvm/include/llvm/Analysis/TargetTransformInfoImpl.h would be to fix LoopVectorize.cpp so that it never calls getNumberOfParts unless we have defined a target. For example, we could just set TypeNotScalarized to be true always when there is no target specified. I wasn't sure which solution was best so I pushed this patch up for now!

Harbormaster completed remote builds in B133953: Diff 386840.Nov 12 2021, 7:40 AM

david-arm added a child revision: D113777: [Analysis] Ensure getTypeLegalizationCost returns a simple VT for TypeScalarizeScalableVector.Nov 12 2021, 8:32 AM

When asking how many parts are required for a scalable vector type there are occasions when it cannot be computed. For example, <vscale x 1 x i3> is one such vector for AArch64+SVE because no matter how we promote the i3 type we never end up with a legal vector.

Not sure I agree with the premise here. Legalizing to <vscale x 2 x i64> should be a viable strategy. I mean, under most circumstances I wouldn't expect the vectorizer to construct vscale x 1 vectors, but we've already done a significant amount of work to allow lowering such vectors.

In D113772#3128166, @efriedma wrote:

When asking how many parts are required for a scalable vector type there are occasions when it cannot be computed. For example, <vscale x 1 x i3> is one such vector for AArch64+SVE because no matter how we promote the i3 type we never end up with a legal vector.

Not sure I agree with the premise here. Legalizing to <vscale x 2 x i64> should be a viable strategy. I mean, under most circumstances I wouldn't expect the vectorizer to construct vscale x 1 vectors, but we've already done a significant amount of work to allow lowering such vectors.

Hi @efriedma, you're right that at some point we might be able to support this should we care about <vscale x 1 x iX> types. However, this patch is more about fixing up a genuine hole in how we use getNumberOfParts to determine if something has been vectorised or not. At the moment, for <vscale x 1 x i3> types we do crash because getNumberOfParts dereferences an invalid cost so I'd like to fix this hole first as a priority to at least stabilise the vectoriser in the short term. Perhaps I can update the commit message to be less misleading?

Given that TargetLoweringBase::getTypeLegalizationCost() can currently fail, I guess this patch makes sense. Long-term, I'm not sure we want it to fail in cases like this, but we don't need to deal with that right now.

The commit message should probably mention TypeScalarizeScalableVector somewhere.

Seems like a sensible fix to me. Thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7432	Not something introduced by your patch, but the name `VectorTy` seems wrong here :)

This revision is now accepted and ready to land.Nov 16 2021, 6:24 AM

david-arm edited the summary of this revision. (Show Details)Nov 17 2021, 2:59 AM

Closed by commit rG670dd402441f: [Analysis] Fix getNumberOfParts to return 0 when the answer is unknown (authored by david-arm). · Explain WhyNov 17 2021, 4:07 AM

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rG670dd402441f: [Analysis] Fix getNumberOfParts to return 0 when the answer is unknown.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfoImpl.h

3 lines

CodeGen/

BasicTTIImpl.h

2 lines

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

11 lines

test/

Transforms/

LoopVectorize/

AArch64/

sve-inductions-unusual-types.ll

51 lines

Diff 387895

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 626 Lines • ▼ Show 20 Lines	public:
}		}

InstructionCost getCallInstrCost(Function F, Type RetTy,		InstructionCost getCallInstrCost(Function F, Type RetTy,
ArrayRef<Type *> Tys,		ArrayRef<Type *> Tys,
TTI::TargetCostKind CostKind) const {		TTI::TargetCostKind CostKind) const {
return 1;		return 1;
}		}

unsigned getNumberOfParts(Type *Tp) const { return 0; }		// Assume that we have a register of the right size for the type.
		unsigned getNumberOfParts(Type *Tp) const { return 1; }

InstructionCost getAddressComputationCost(Type Tp, ScalarEvolution ,		InstructionCost getAddressComputationCost(Type Tp, ScalarEvolution ,
const SCEV *) const {		const SCEV *) const {
return 0;		return 0;
}		}

InstructionCost getArithmeticReductionCost(unsigned, VectorType *,		InstructionCost getArithmeticReductionCost(unsigned, VectorType *,
Optional<FastMathFlags> FMF,		Optional<FastMathFlags> FMF,
▲ Show 20 Lines • Show All 568 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 2,020 Lines • ▼ Show 20 Lines	InstructionCost getCallInstrCost(Function F, Type RetTy,
ArrayRef<Type *> Tys,		ArrayRef<Type *> Tys,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
return 10;		return 10;
}		}

unsigned getNumberOfParts(Type *Tp) {		unsigned getNumberOfParts(Type *Tp) {
std::pair<InstructionCost, MVT> LT =		std::pair<InstructionCost, MVT> LT =
getTLI()->getTypeLegalizationCost(DL, Tp);		getTLI()->getTypeLegalizationCost(DL, Tp);
return *LT.first.getValue();		return LT.first.isValid() ? *LT.first.getValue() : 0;
}		}

InstructionCost getAddressComputationCost(Type Ty, ScalarEvolution ,		InstructionCost getAddressComputationCost(Type Ty, ScalarEvolution ,
const SCEV *) {		const SCEV *) {
return 0;		return 0;
}		}

/// Try to calculate arithmetic and shuffle op costs for reduction intrinsics.		/// Try to calculate arithmetic and shuffle op costs for reduction intrinsics.
▲ Show 20 Lines • Show All 221 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,422 Lines • ▼ Show 20 Lines	if (InstSet.count(I))
(getInstructionCost(I, ElementCount::getFixed(1)).first *		(getInstructionCost(I, ElementCount::getFixed(1)).first *
VF.getKnownMinValue()),		VF.getKnownMinValue()),
false);		false);
}		}

Type *VectorTy;		Type *VectorTy;
InstructionCost C = getInstructionCost(I, VF, VectorTy);		InstructionCost C = getInstructionCost(I, VF, VectorTy);

bool TypeNotScalarized =		bool TypeNotScalarized = false;
VF.isVector() && VectorTy->isVectorTy() &&		if (VF.isVector() && VectorTy->isVectorTy()) {
		sdesmalenUnsubmitted Not Done Reply Inline Actions Not something introduced by your patch, but the name `VectorTy` seems wrong here :) sdesmalen: Not something introduced by your patch, but the name `VectorTy` seems wrong here :)
TTI.getNumberOfParts(VectorTy) < VF.getKnownMinValue();		unsigned NumParts = TTI.getNumberOfParts(VectorTy);
		if (NumParts)
		TypeNotScalarized = NumParts < VF.getKnownMinValue();
		else
		C = InstructionCost::getInvalid();
		}
return VectorizationCostTy(C, TypeNotScalarized);		return VectorizationCostTy(C, TypeNotScalarized);
}		}

InstructionCost		InstructionCost
LoopVectorizationCostModel::getScalarizationOverhead(Instruction *I,		LoopVectorizationCostModel::getScalarizationOverhead(Instruction *I,
ElementCount VF) const {		ElementCount VF) const {

// There is no mechanism yet to create a scalable scalarization loop,		// There is no mechanism yet to create a scalable scalarization loop,
▲ Show 20 Lines • Show All 3,245 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll

This file was added.

				; REQUIRES: asserts
				; RUN: opt -scalable-vectorization=on -loop-vectorize -S < %s -debug 2>%t \| FileCheck %s
				; RUN: cat %t \| FileCheck %s --check-prefix=DEBUG

				target triple = "aarch64-unknown-linux-gnu"

				; DEBUG: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %indvars.iv1294 = phi i7 [ %indvars.iv.next1295, %for.body ], [ 0, %entry ]
				; DEBUG: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %addi7 = add i7 %indvars.iv1294, 0
				; DEBUG: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %indvars.iv.next1295 = add i7 %indvars.iv1294, 1

				define void @induction_i7(i64* %dst) #0 {
				; CHECK-LABEL: @induction_i7(
				; CHECK: vector.ph:
				; CHECK: [[TMP4:%.*]] = call <vscale x 2 x i8> @llvm.experimental.stepvector.nxv2i8()
				; CHECK: [[TMP5:%.*]] = trunc <vscale x 2 x i8> %4 to <vscale x 2 x i7>
				; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 2 x i7> [[TMP5]], zeroinitializer
				; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 2 x i7> [[TMP6]], shufflevector (<vscale x 2 x i7> insertelement (<vscale x 2 x i7> poison, i7 1, i32 0), <vscale x 2 x i7> poison, <vscale x 2 x i32> zeroinitializer)
				; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 2 x i7> zeroinitializer, [[TMP7]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 2 x i7> [ [[INDUCTION]], %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]
				; CHECK-NEXT: [[TMP10:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP11:%.*]] = add <vscale x 2 x i7> [[VEC_IND]], zeroinitializer
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i64, i64 [[DST:%.*]], i64 [[TMP10]]
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds i64, i64 [[TMP12]], i32 0
				; CHECK-NEXT: [[TMP14:%.]] = bitcast i64 [[TMP13]] to <vscale x 2 x i64>*
				; CHECK-NEXT: store <vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64>* [[TMP14]], align 8
				; CHECK-NEXT: [[TMP15:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP16:%.*]] = mul i64 [[TMP15]], 2
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP16]]
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 2 x i7> [[VEC_IND]],
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv1294 = phi i7 [ %indvars.iv.next1295, %for.body ], [ 0, %entry ]
				%indvars.iv1286 = phi i64 [ %indvars.iv.next1287, %for.body ], [ 0, %entry ]
				%addi7 = add i7 %indvars.iv1294, 0
				%arrayidx = getelementptr inbounds i64, i64* %dst, i64 %indvars.iv1286
				store i64 0, i64* %arrayidx, align 8
				%indvars.iv.next1287 = add nuw nsw i64 %indvars.iv1286, 1
				%indvars.iv.next1295 = add i7 %indvars.iv1294, 1
				%exitcond = icmp eq i64 %indvars.iv.next1287, 64
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

				attributes #0 = {"target-features"="+sve"}