This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
sve-basic-vec.ll

Differential D98054

[LoopVectorize][SVE] Fix crash when vectorising FP negation
ClosedPublic

Authored by david-arm on Mar 5 2021, 9:21 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
peterwaller-arm
craig.topper
efriedma
dmgreen

Commits

rG00e65f334546: [LoopVectorize][SVE] Fix crash when vectorising FP negation

Summary

This patch fixes a crash encountered when vectorising the following loop:

void foo(float *dst, float *src, long long n) {
  for (long long i = 0; i < n; i++)
    dst[i] = -src[i];
}

using scalable vectors. I've added a test to

Transforms/LoopVectorize/AArch64/sve-basic-vec.ll

as well as cleaned up the other tests in the same file.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Mar 5 2021, 9:21 AM

Herald added a reviewer: efriedma. · View Herald TranscriptMar 5 2021, 9:21 AM

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

david-arm requested review of this revision.Mar 5 2021, 9:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2021, 9:21 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm edited the summary of this revision. (Show Details)Mar 5 2021, 9:22 AM

Harbormaster completed remote builds in B92336: Diff 328551.Mar 6 2021, 1:54 AM

sdesmalen added inline comments.Mar 8 2021, 4:33 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7551–7552	From looking at `isScalarAfterVectorization` and `collectLoopScalars`, if the node is scalar after vectorization, it could be that: VF is scalar (i.e. VF.getFixedValue() == 1). VF is a vector, and the operation remains scalar. e.g. induction variable updates in the loop, which all remain scalar. There is special code in `collectLoopScalars` that does that analysis. VF is a vector, and the operation is scalarized (i.e. `getWideningDecision(.., VF) == CM_Scalarize`). For the case where the result is scalar after vectorization, I would have expected `VectorTy->getVectorElementType()` to be passed to `TTI.getArithmeticInstrCost`, not `VectorTy`, so I wonder if that's a bug. Also, for case 2, we shouldn't be multiplying the cost by `N`. This code should instead check explicitly if the node is scalarized instead of relying on the more broader-defined `isScalarAfterVectorization`. When scalarization does happen, the cost must be multiplied with `VF.getFixedValue()` instead (which has the implicit assert that VF is not scalable), so that you can avoid adding an unnecessary branch.

david-arm added a reviewer: dmgreen.Mar 8 2021, 5:16 AM

Attempted to address @sdesmalen 's comments about the cost multiplier N. I created a new patch to remove the multiplier (D98512), since after some investigation it seems that it should always be 1.

david-arm added a parent revision: D98512: [LoopVectorize] Simplify scalar cost calculation in getInstructionCost.Mar 12 2021, 8:22 AM

Hi @sdesmalen, thanks for pointing this out! It looks like for the cases you listed above we actually have:

VF is scalar, in which case N=1. The code in getInstructionCost is misleading since VectorTy is actually the element type in this case!
VF is a vector. If this is an instruction to be scalarised then it will live in InstsInScalarize and the higher level getInstructionCost will deal with this separately. It seems that isScalarAfterVectorization returns true if the instruction is a member of the Scalars variable. From what I can tell the members of Scalars fall into two main categories: GEPs/bitcasts with scalar uses, induction variable updates with scalar uses. In these cases the instruction will remain as a single scalar instruction.

This patch LGTM now, thanks for the investigation!

This revision is now accepted and ready to land.Mar 12 2021, 8:33 AM

Harbormaster completed remote builds in B93506: Diff 330250.Mar 12 2021, 9:10 AM

david-arm edited parent revisions, added: D99718: [LoopVectorize] Simplify scalar cost calculation in getInstructionCost; removed: D98512: [LoopVectorize] Simplify scalar cost calculation in getInstructionCost.Apr 1 2021, 4:20 AM

This revision was landed with ongoing or failed builds.Apr 28 2021, 7:22 AM

Closed by commit rG00e65f334546: [LoopVectorize][SVE] Fix crash when vectorising FP negation (authored by david-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rG00e65f334546: [LoopVectorize][SVE] Fix crash when vectorising FP negation.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

1 line

test/

Transforms/

LoopVectorize/

AArch64/

sve-basic-vec.ll

35 lines

Diff 341193

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,542 Lines • ▼ Show 20 Lines	case Instruction::Xor: {
if (Op2VK == TargetTransformInfo::OK_AnyValue && Legal->isUniform(Op2))		if (Op2VK == TargetTransformInfo::OK_AnyValue && Legal->isUniform(Op2))
Op2VK = TargetTransformInfo::OK_UniformValue;		Op2VK = TargetTransformInfo::OK_UniformValue;

SmallVector<const Value *, 4> Operands(I->operand_values());		SmallVector<const Value *, 4> Operands(I->operand_values());
return TTI.getArithmeticInstrCost(		return TTI.getArithmeticInstrCost(
I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,		I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);		Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);
}		}
case Instruction::FNeg: {		case Instruction::FNeg: {
assert(!VF.isScalable() && "VF is assumed to be non scalable.");
return TTI.getArithmeticInstrCost(		return TTI.getArithmeticInstrCost(
		sdesmalenUnsubmitted Not Done Reply Inline Actions From looking at `isScalarAfterVectorization` and `collectLoopScalars`, if the node is scalar after vectorization, it could be that: VF is scalar (i.e. VF.getFixedValue() == 1). VF is a vector, and the operation remains scalar. e.g. induction variable updates in the loop, which all remain scalar. There is special code in `collectLoopScalars` that does that analysis. VF is a vector, and the operation is scalarized (i.e. `getWideningDecision(.., VF) == CM_Scalarize`). For the case where the result is scalar after vectorization, I would have expected `VectorTy->getVectorElementType()` to be passed to `TTI.getArithmeticInstrCost`, not `VectorTy`, so I wonder if that's a bug. Also, for case 2, we shouldn't be multiplying the cost by `N`. This code should instead check explicitly if the node is scalarized instead of relying on the more broader-defined `isScalarAfterVectorization`. When scalarization does happen, the cost must be multiplied with `VF.getFixedValue()` instead (which has the implicit assert that VF is not scalable), so that you can avoid adding an unnecessary branch. sdesmalen: From looking at `isScalarAfterVectorization` and `collectLoopScalars`, if the node is scalar…
I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,		I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None,		TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None,
TargetTransformInfo::OP_None, I->getOperand(0), I);		TargetTransformInfo::OP_None, I->getOperand(0), I);
}		}
case Instruction::Select: {		case Instruction::Select: {
SelectInst *SI = cast<SelectInst>(I);		SelectInst *SI = cast<SelectInst>(I);
const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());		const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());
bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));		bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));
▲ Show 20 Lines • Show All 2,618 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll

	; RUN: opt -loop-vectorize -dce -instcombine -mtriple aarch64-linux-gnu -mattr=+sve < %s -S \| FileCheck %s			; RUN: opt -loop-vectorize -dce -instcombine -mtriple aarch64-linux-gnu -mattr=+sve < %s -S \| FileCheck %s


	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	define void @cmpsel_i32(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i64 %n) {			define void @cmpsel_i32(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i64 %n) {
	; CHECK-LABEL: @cmpsel_i32(			; CHECK-LABEL: @cmpsel_i32(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: [[WIDE_LOAD:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> {{.*}}, align 4			; CHECK: [[WIDE_LOAD:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> {{.*}}, align 4
	; CHECK-NEXT: [[TMP1:%.*]] = icmp eq <vscale x 4 x i32> [[WIDE_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 0, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP1:%.*]] = icmp eq <vscale x 4 x i32> [[WIDE_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 0, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP2:%.*]] = select <vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 10, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP2:%.*]] = select <vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 10, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK: store <vscale x 4 x i32> [[TMP2]], <vscale x 4 x i32>* {{.*}}, align 4			; CHECK: store <vscale x 4 x i32> [[TMP2]], <vscale x 4 x i32>* {{.*}}, align 4
	;			;
	entry:			entry:
	%cmp7 = icmp sgt i64 %n, 0			br label %for.body
	br i1 %cmp7, label %for.body, label %for.end

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%tobool.not = icmp eq i32 %0, 0			%tobool.not = icmp eq i32 %0, 0
	%cond = select i1 %tobool.not, i32 2, i32 10			%cond = select i1 %tobool.not, i32 2, i32 10
	%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	Show All 14 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: [[WIDE_LOAD:%.]] = load <vscale x 4 x float>, <vscale x 4 x float> {{.*}}, align 4			; CHECK: [[WIDE_LOAD:%.]] = load <vscale x 4 x float>, <vscale x 4 x float> {{.*}}, align 4
	; CHECK-NEXT: [[TMP1:%.*]] = fcmp ogt <vscale x 4 x float> [[WIDE_LOAD]], shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 3.000000e+00, i32 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP1:%.*]] = fcmp ogt <vscale x 4 x float> [[WIDE_LOAD]], shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 3.000000e+00, i32 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP2:%.*]] = select <vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 1.000000e+01, i32 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 2.000000e+00, i32 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP2:%.*]] = select <vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 1.000000e+01, i32 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 2.000000e+00, i32 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK: store <vscale x 4 x float> [[TMP2]], <vscale x 4 x float>* {{.*}}, align 4			; CHECK: store <vscale x 4 x float> [[TMP2]], <vscale x 4 x float>* {{.*}}, align 4

	entry:			entry:
	%cmp8 = icmp sgt i64 %n, 0			br label %for.body
	br i1 %cmp8, label %for.body, label %for.end

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4			%0 = load float, float* %arrayidx, align 4
	%cmp1 = fcmp ogt float %0, 3.000000e+00			%cmp1 = fcmp ogt float %0, 3.000000e+00
	%conv = select i1 %cmp1, float 1.000000e+01, float 2.000000e+00			%conv = select i1 %cmp1, float 1.000000e+01, float 2.000000e+00
	%arrayidx3 = getelementptr inbounds float, float* %a, i64 %indvars.iv			%arrayidx3 = getelementptr inbounds float, float* %a, i64 %indvars.iv
	store float %conv, float* %arrayidx3, align 4			store float %conv, float* %arrayidx3, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond.not = icmp eq i64 %indvars.iv.next, %n			%exitcond.not = icmp eq i64 %indvars.iv.next, %n
	br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !6			br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret void
				}

				define void @fneg_f32(float* noalias nocapture %a, float* noalias nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @fneg_f32(
				; CHECK-NEXT: entry:
				; CHECK: vector.body:
				; CHECK: [[WIDE_LOAD:%.]] = load <vscale x 4 x float>, <vscale x 4 x float> {{.*}}, align 4
				; CHECK-NEXT: [[TMP1:%.*]] = fneg <vscale x 4 x float> [[WIDE_LOAD]]
				; CHECK: store <vscale x 4 x float> [[TMP1]], <vscale x 4 x float>* {{.*}}, align 4

				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%fneg = fneg float %0
				%arrayidx3 = getelementptr inbounds float, float* %a, i64 %indvars.iv
				store float %fneg, float* %arrayidx3, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	!0 = distinct !{!0, !1, !2, !3, !4, !5}			!0 = distinct !{!0, !1, !2, !3, !4, !5}
	!1 = !{!"llvm.loop.mustprogress"}			!1 = !{!"llvm.loop.mustprogress"}
	!2 = !{!"llvm.loop.vectorize.width", i32 4}			!2 = !{!"llvm.loop.vectorize.width", i32 4}
	!3 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}			!3 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
	!4 = !{!"llvm.loop.interleave.count", i32 1}			!4 = !{!"llvm.loop.interleave.count", i32 1}
	!5 = !{!"llvm.loop.vectorize.enable", i1 true}			!5 = !{!"llvm.loop.vectorize.enable", i1 true}
	!6 = distinct !{!6, !1, !2, !3, !4, !5}