This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
1/1
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
2/2
scalable-call.ll

Differential D96356

[SVE][LoopVectorize] Support for vectorization of loops with function calls
ClosedPublic

Authored by kmclaughlin on Feb 9 2021, 10:31 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
david-arm
dmgreen
fhahn
efriedma

Commits

rGfea06efe7c92: [SVE][LoopVectorize] Support for vectorization of loops with function calls

Summary

Changes getScalarizationOverhead to return an invalid cost for scalable VFs
and adds some simple tests for loops containing a function for which
there is a vectorized variant available.

Diff Detail

Unit TestsFailed

	Time	Test
	60 ms	x64 debian > Polly.ScopInfo::user_provided_assumptions.ll

Event Timeline

kmclaughlin created this revision.Feb 9 2021, 10:31 AM

Herald added a reviewer: efriedma. · View Herald TranscriptFeb 9 2021, 10:31 AM

Herald added subscribers: psnobl, hiraditya, tschuett. · View Herald Transcript

kmclaughlin requested review of this revision.Feb 9 2021, 10:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2021, 10:31 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B88488: Diff 322437.Feb 9 2021, 11:06 AM

david-arm added inline comments.Feb 10 2021, 5:59 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

4930

nit: I think we can remove the braces now.

llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll

Is it worth adding tests for LLVM intrinsics too that use fast math, i.e. test/Transforms/LoopVectorize/PowerPC/widened-massv-vfabi-attr.ll has an example like this:

  %1 = call fast double @llvm.sin.f64(double %conv) #1
  %add = fadd fast double %Sum.0, %1
...
declare double @llvm.sin.f64(double) #0
declare <2 x double> @__sind2_massv(<2 x double>) #0
attributes #0 = { nounwind readnone speculatable willreturn }
attributes #1 = { "vector-function-abi-variant"="_ZGV_LLVM_N2v_llvm.sin.f64(__sind2_massv)" }

Removed unnecessary braces from widenCallInstruction
Added a test for a loop containing an LLVM intrinsic (@llvm.sin.f64)

LGTM! Thanks for the new test!

This revision is now accepted and ready to land.Feb 11 2021, 5:37 AM

Added the -force-vector-interleave=1 flag to the scalable-call.ll

david-arm added inline comments.Feb 11 2021, 5:47 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll
78	nit: If you could make the tests consistent in terms of the numbers of CHECK lines before merging that would be great! Perhaps disable interleaving with `-force-vector-interleave=1`?

Closed by commit rGfea06efe7c92: [SVE][LoopVectorize] Support for vectorization of loops with function calls (authored by kmclaughlin). · Explain WhyFeb 12 2021, 5:48 AM

This revision was automatically updated to reflect the committed changes.

kmclaughlin marked an inline comment as done.

kmclaughlin added a commit: rGfea06efe7c92: [SVE][LoopVectorize] Support for vectorization of loops with function calls.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

TargetTransformInfo.cpp

1 line

Transforms/

Vectorize/

LoopVectorize.cpp

7 lines

test/

Transforms/

LoopVectorize/

AArch64/

scalable-call.ll

85 lines

Diff 322437

llvm/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id,
ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());		ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());
}		}

IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id,		IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id,
const CallBase &CI,		const CallBase &CI,
ElementCount Factor)		ElementCount Factor)
: RetTy(CI.getType()), IID(Id), VF(Factor) {		: RetTy(CI.getType()), IID(Id), VF(Factor) {

assert(!Factor.isScalable() && "Scalable vectors are not yet supported");
if (auto *FPMO = dyn_cast<FPMathOperator>(&CI))		if (auto *FPMO = dyn_cast<FPMathOperator>(&CI))
FMF = FPMO->getFastMathFlags();		FMF = FPMO->getFastMathFlags();

Arguments.insert(Arguments.begin(), CI.arg_begin(), CI.arg_end());		Arguments.insert(Arguments.begin(), CI.arg_begin(), CI.arg_end());
FunctionType *FTy =		FunctionType *FTy =
CI.getCalledFunction()->getFunctionType();		CI.getCalledFunction()->getFunctionType();
ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());		ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());
}		}
▲ Show 20 Lines • Show All 1,380 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,714 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;) {

CSEMap[In] = In;		CSEMap[In] = In;
}		}
}		}

InstructionCost		InstructionCost
LoopVectorizationCostModel::getVectorCallCost(CallInst *CI, ElementCount VF,		LoopVectorizationCostModel::getVectorCallCost(CallInst *CI, ElementCount VF,
bool &NeedToScalarize) {		bool &NeedToScalarize) {
assert(!VF.isScalable() && "scalable vectors not yet supported.");
Function *F = CI->getCalledFunction();		Function *F = CI->getCalledFunction();
Type *ScalarRetTy = CI->getType();		Type *ScalarRetTy = CI->getType();
SmallVector<Type *, 4> Tys, ScalarTys;		SmallVector<Type *, 4> Tys, ScalarTys;
for (auto &ArgOp : CI->arg_operands())		for (auto &ArgOp : CI->arg_operands())
ScalarTys.push_back(ArgOp->getType());		ScalarTys.push_back(ArgOp->getType());

// Estimate cost of scalarized vector call. The source operands are assumed		// Estimate cost of scalarized vector call. The source operands are assumed
// to be vectors, so we need to extract individual elements from there,		// to be vectors, so we need to extract individual elements from there,
▲ Show 20 Lines • Show All 1,191 Lines • ▼ Show 20 Lines	for (auto &I : enumerate(ArgOperands.operands())) {
Arg = State.get(I.value(), VPIteration(0, 0));		Arg = State.get(I.value(), VPIteration(0, 0));
Args.push_back(Arg);		Args.push_back(Arg);
}		}

Function *VectorF;		Function *VectorF;
if (UseVectorIntrinsic) {		if (UseVectorIntrinsic) {
// Use vector version of the intrinsic.		// Use vector version of the intrinsic.
Type *TysForDecl[] = {CI->getType()};		Type *TysForDecl[] = {CI->getType()};
if (VF.isVector()) {		if (VF.isVector()) {
		david-armUnsubmitted Done Reply Inline Actions nit: I think we can remove the braces now. david-arm: nit: I think we can remove the braces now.
assert(!VF.isScalable() && "VF is assumed to be non scalable.");
TysForDecl[0] = VectorType::get(CI->getType()->getScalarType(), VF);		TysForDecl[0] = VectorType::get(CI->getType()->getScalarType(), VF);
}		}
VectorF = Intrinsic::getDeclaration(M, ID, TysForDecl);		VectorF = Intrinsic::getDeclaration(M, ID, TysForDecl);
assert(VectorF && "Can't retrieve vector intrinsic.");		assert(VectorF && "Can't retrieve vector intrinsic.");
} else {		} else {
// Use vector version of the function call.		// Use vector version of the function call.
const VFShape Shape = VFShape::get(CI, VF, false /HasGlobalPred*/);		const VFShape Shape = VFShape::get(CI, VF, false /HasGlobalPred*/);
#ifndef NDEBUG		#ifndef NDEBUG
▲ Show 20 Lines • Show All 2,054 Lines • ▼ Show 20 Lines	bool TypeNotScalarized =
TTI.getNumberOfParts(VectorTy) < VF.getKnownMinValue();		TTI.getNumberOfParts(VectorTy) < VF.getKnownMinValue();
return VectorizationCostTy(C, TypeNotScalarized);		return VectorizationCostTy(C, TypeNotScalarized);
}		}

InstructionCost		InstructionCost
LoopVectorizationCostModel::getScalarizationOverhead(Instruction *I,		LoopVectorizationCostModel::getScalarizationOverhead(Instruction *I,
ElementCount VF) {		ElementCount VF) {

assert(!VF.isScalable() &&		if (VF.isScalable())
"cannot compute scalarization overhead for scalable vectorization");		return InstructionCost::getInvalid();

if (VF.isScalar())		if (VF.isScalar())
return 0;		return 0;

InstructionCost Cost = 0;		InstructionCost Cost = 0;
Type *RetTy = ToVectorTy(I->getType(), VF);		Type *RetTy = ToVectorTy(I->getType(), VF);
if (!RetTy->isVoidTy() &&		if (!RetTy->isVoidTy() &&
(!isa<LoadInst>(I) \|\| !TTI.supportsEfficientVectorElementLoadStore()))		(!isa<LoadInst>(I) \|\| !TTI.supportsEfficientVectorElementLoadStore()))
Cost += TTI.getScalarizationOverhead(		Cost += TTI.getScalarizationOverhead(
▲ Show 20 Lines • Show All 2,730 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll

This file was added.

				; RUN: opt -S -loop-vectorize -instcombine -mattr=+sve -mtriple aarch64-unknown-linux-gnu < %s \| FileCheck %s

				david-armUnsubmitted Done Reply Inline Actions Is it worth adding tests for LLVM intrinsics too that use fast math, i.e. test/Transforms/LoopVectorize/PowerPC/widened-massv-vfabi-attr.ll has an example like this: %1 = call fast double @llvm.sin.f64(double %conv) #1 %add = fadd fast double %Sum.0, %1 ... declare double @llvm.sin.f64(double) #0 declare <2 x double> @__sind2_massv(<2 x double>) #0 attributes #0 = { nounwind readnone speculatable willreturn } attributes #1 = { "vector-function-abi-variant"="_ZGV_LLVM_N2v_llvm.sin.f64(__sind2_massv)" } david-arm: Is it worth adding tests for LLVM intrinsics too that use fast math, i.e.
				define void @vec_load(i64 %N, double* nocapture %a, double* nocapture readonly %b) {
				; CHECK-LABEL: @vec_load
				; CHECK: vector.body:
				; CHECK: %[[LOAD:.]] = load <vscale x 2 x double>, <vscale x 2 x double>
				; CHECK: call <vscale x 2 x double> @foo_vec(<vscale x 2 x double> %[[LOAD]])
				entry:
				%cmp7 = icmp sgt i64 %N, 0
				br i1 %cmp7, label %for.body, label %for.end

				for.body: ; preds = %for.body.preheader, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%arrayidx = getelementptr inbounds double, double* %b, i64 %iv
				%0 = load double, double* %arrayidx, align 8
				%1 = call double @foo(double %0) #0
				%add = fadd double %1, 1.000000e+00
				%arrayidx2 = getelementptr inbounds double, double* %a, i64 %iv
				store double %add, double* %arrayidx2, align 8
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !1

				for.end: ; preds = %for.body, %entry
				ret void
				}

				define void @vec_scalar(i64 %N, double* nocapture %a, double* nocapture readonly %b) {
				; CHECK-LABEL: @vec_scalar
				; CHECK: vector.body:
				; CHECK: call <vscale x 2 x double> @foo_vec(<vscale x 2 x double> shufflevector (<vscale x 2 x double> insertelement (<vscale x 2 x double> poison, double 1.000000e+01, i32 0), <vscale x 2 x double> poison, <vscale x 2 x i32> zeroinitializer))
				entry:
				%cmp7 = icmp sgt i64 %N, 0
				br i1 %cmp7, label %for.body, label %for.end

				for.body: ; preds = %for.body.preheader, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%0 = call double @foo(double 10.0) #0
				%sub = fsub double %0, 1.000000e+00
				%arrayidx = getelementptr inbounds double, double* %a, i64 %iv
				store double %sub, double* %arrayidx, align 8
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !1

				for.end: ; preds = %for.body, %entry
				ret void
				}

				define void @vec_ptr(i64 %N, i64* noalias %a, i64** readnone %b) {
				; CHECK-LABEL: @vec_ptr
				; CHECK: vector.body:
				; CHECK: %[[LOAD:.]] = load <vscale x 2 x i64>, <vscale x 2 x i64>
				; CHECK: call <vscale x 2 x i64> @bar_vec(<vscale x 2 x i64*> %[[LOAD]])
				entry:
				%cmp7 = icmp sgt i64 %N, 0
				br i1 %cmp7, label %for.body, label %for.end

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%gep = getelementptr i64, i64* %b, i64 %iv
				%load = load i64, i64* %gep
				%call = call i64 @bar(i64* %load) #1
				%arrayidx = getelementptr inbounds i64, i64* %a, i64 %iv
				store i64 %call, i64* %arrayidx
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1024
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !1

				for.end:
				ret void
				}

				declare double @foo(double)
				declare i64 @bar(i64*)

				declare <vscale x 2 x double> @foo_vec(<vscale x 2 x double>)
				declare <vscale x 2 x i64> @bar_vec(<vscale x 2 x i64*>)
				david-armUnsubmitted Done Reply Inline Actions nit: If you could make the tests consistent in terms of the numbers of CHECK lines before merging that would be great! Perhaps disable interleaving with `-force-vector-interleave=1`? david-arm: nit: If you could make the tests consistent in terms of the numbers of CHECK lines before…

				attributes #0 = { "vector-function-abi-variant"="_ZGV_LLVM_Nxv_foo(foo_vec)" }
				attributes #1 = { "vector-function-abi-variant"="_ZGV_LLVM_Nxv_bar(bar_vec)" }

				!1 = distinct !{!1, !2, !3}
				!2 = !{!"llvm.loop.vectorize.width", i32 2}
				!3 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

This is an archive of the discontinued LLVM Phabricator instance.

[SVE][LoopVectorize] Support for vectorization of loops with function callsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 322437

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/scalable-call.ll

[SVE][LoopVectorize] Support for vectorization of loops with function calls
ClosedPublic