This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2/2
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
5/7
scalable-alloca.ll

Differential D105824

[LV] Avoid scalable vectorization for loops containing alloca
ClosedPublic

Authored by kmclaughlin on Jul 12 2021, 9:09 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
dmgreen
CarolineConcatto
david-arm

Commits

rG49d73130ca17: [LV] Avoid scalable vectorization for loops containing alloca

Summary

This patch returns an Invalid cost from getInstructionCost() for alloca
instructions if the VF is scalable, as otherwise loops which contain
these instructions will crash when attempting to scalarize the alloca.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kmclaughlin created this revision.Jul 12 2021, 9:09 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJul 12 2021, 9:09 AM

kmclaughlin requested review of this revision.Jul 12 2021, 9:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2021, 9:09 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

sdesmalen added inline comments.Jul 12 2021, 9:19 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7907	Can you add a comment describing why we don't want to vectorize this for scalable vectors?
llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll
30	Can you force the loop to be vectorized with a scalable VF using a loop-hint? With the current RUN line and without hints, the loop may make the decision for other reasons (like considering a fixed-VF cheaper, rather than outright ignoring a scalable VF that was otherwise forced, but had an invalid cost). It may also be worth rebasing your patch on `D105806` and adding a CHECK line for one of the remarks (although the remarks may still change depending on review feedback)

david-arm added inline comments.Jul 12 2021, 9:29 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll
33	Do we need this `alloca` and the call to `@foo` at the bottom? I wonder if it may be simpler to pass a `i32** %vla` pointer to the function? I think the main thing we care about is the `alloca` in the loop I think?
34	nit: I think we can simplify the test here by just branching directly, i.e. br label %for.body without the `icmp`.

Harbormaster completed remote builds in B113519: Diff 357961.Jul 12 2021, 10:00 AM

Changes to scalable-alloca.ll:

Added a loop hint to force vectorization with a scalable VF
Rebased the patch and added CHECK lines for Invalid costs found in the loop (from D105806)
Passed i32** %vla to the @alloca function

Harbormaster completed remote builds in B114031: Diff 358664.Jul 14 2021, 11:52 AM

CarolineConcatto added inline comments.Jul 15 2021, 3:58 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll
2	Makes sense to use the python file to create the check lines here?

sdesmalen added inline comments.Jul 15 2021, 4:44 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7909	nit: this line is redundant, because the code says as much.
llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll
2	@CarolineConcatto I don't think that is necessary for this specific patch because this test is not guarding the fixed-width vectorization capabilities of the LV, nor the AArch64 CostModel.
4	Can you also add: `CHECK-REMARKS: UserVF ignored because of invalid costs.`
9	I'd suggest to remove these CHECK lines, because the fact that this loop vectorizes with fixed-width vectors is down to the cost-model, which means we may have to regenerate the check lines at some point, even though they're not needed for what it is that you're trying to test.

Removed unnecessary CHECK lines from scalable-alloca.ll
Added a check for the "UserVF ignored because of invalid costs" remark

Harbormaster completed remote builds in B114219: Diff 358940.Jul 15 2021, 6:37 AM

LGTM, thanks.

Just FYI, you'll want to wait with landing it until I've relanded D105806.

This revision is now accepted and ready to land.Jul 15 2021, 7:27 AM

This revision was landed with ongoing or failed builds.Jul 16 2021, 3:48 AM

Closed by commit rG49d73130ca17: [LV] Avoid scalable vectorization for loops containing alloca (authored by kmclaughlin). · Explain Why

This revision was automatically updated to reflect the committed changes.

kmclaughlin added a commit: rG49d73130ca17: [LV] Avoid scalable vectorization for loops containing alloca.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

6 lines

test/

Transforms/

LoopVectorize/

AArch64/

scalable-alloca.ll

31 lines

Diff 359276

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,897 Lines • ▼ Show 20 Lines	case Instruction::Call: {
if (getVectorIntrinsicIDForCall(CI, TLI)) {		if (getVectorIntrinsicIDForCall(CI, TLI)) {
InstructionCost IntrinsicCost = getVectorIntrinsicCost(CI, VF);		InstructionCost IntrinsicCost = getVectorIntrinsicCost(CI, VF);
return std::min(CallCost, IntrinsicCost);		return std::min(CallCost, IntrinsicCost);
}		}
return CallCost;		return CallCost;
}		}
case Instruction::ExtractValue:		case Instruction::ExtractValue:
return TTI.getInstructionCost(I, TTI::TCK_RecipThroughput);		return TTI.getInstructionCost(I, TTI::TCK_RecipThroughput);
		case Instruction::Alloca:
		// We cannot easily widen alloca to a scalable alloca, as
		sdesmalenUnsubmitted Done Reply Inline Actions Can you add a comment describing why we don't want to vectorize this for scalable vectors? sdesmalen: Can you add a comment describing why we don't want to vectorize this for scalable vectors?
		// the result would need to be a vector of pointers.
		if (VF.isScalable())
		sdesmalenUnsubmitted Done Reply Inline Actions nit: this line is redundant, because the code says as much. sdesmalen: nit: this line is redundant, because the code says as much.
		return InstructionCost::getInvalid();
		LLVM_FALLTHROUGH;
default:		default:
// This opcode is unknown. Assume that it is the same as 'mul'.		// This opcode is unknown. Assume that it is the same as 'mul'.
return TTI.getArithmeticInstrCost(Instruction::Mul, VectorTy, CostKind);		return TTI.getArithmeticInstrCost(Instruction::Mul, VectorTy, CostKind);
} // end of switch.		} // end of switch.
}		}

char LoopVectorize::ID = 0;		char LoopVectorize::ID = 0;

▲ Show 20 Lines • Show All 2,571 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-alloca.ll

This file was added.

				; RUN: opt -S -loop-vectorize -mattr=+sve -mtriple aarch64-unknown-linux-gnu -force-vector-width=2 -scalable-vectorization=preferred -pass-remarks-analysis=loop-vectorize -pass-remarks-missed=loop-vectorize < %s 2>%t \| FileCheck %s
				; RUN: FileCheck %s --check-prefix=CHECK-REMARKS < %t
				CarolineConcattoUnsubmitted Not Done Reply Inline Actions Makes sense to use the python file to create the check lines here? CarolineConcatto: Makes sense to use the python file to create the check lines here?
				sdesmalenUnsubmitted Not Done Reply Inline Actions @CarolineConcatto I don't think that is necessary for this specific patch because this test is not guarding the fixed-width vectorization capabilities of the LV, nor the AArch64 CostModel. sdesmalen: @CarolineConcatto I don't think that is necessary for this specific patch because this test is…

				; CHECK-REMARKS: UserVF ignored because of invalid costs.
				sdesmalenUnsubmitted Done Reply Inline Actions Can you also add: `CHECK-REMARKS: UserVF ignored because of invalid costs.` sdesmalen: Can you also add: `CHECK-REMARKS: UserVF ignored because of invalid costs.`
				; CHECK-REMARKS: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2): alloca
				; CHECK-REMARKS: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
				define void @alloca(i32** %vla, i64 %N) {
				; CHECK-LABEL: @alloca(
				; CHECK-NOT: <vscale x
				sdesmalenUnsubmitted Done Reply Inline Actions I'd suggest to remove these CHECK lines, because the fact that this loop vectorizes with fixed-width vectors is down to the cost-model, which means we may have to regenerate the check lines at some point, even though they're not needed for what it is that you're trying to test. sdesmalen: I'd suggest to remove these CHECK lines, because the fact that this loop vectorizes with fixed…

				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]
				%alloca = alloca i32, align 16
				%arrayidx = getelementptr inbounds i32, i32* %vla, i64 %iv
				store i32* %alloca, i32** %arrayidx, align 8
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				call void @foo(i32** nonnull %vla)
				ret void
				}

				declare void @foo(i32**)

				!0 = !{!0, !1}
				sdesmalenUnsubmitted Done Reply Inline Actions Can you force the loop to be vectorized with a scalable VF using a loop-hint? With the current RUN line and without hints, the loop may make the decision for other reasons (like considering a fixed-VF cheaper, rather than outright ignoring a scalable VF that was otherwise forced, but had an invalid cost). It may also be worth rebasing your patch on `D105806` and adding a CHECK line for one of the remarks (although the remarks may still change depending on review feedback) sdesmalen: Can you force the loop to be vectorized with a scalable VF using a loop-hint? With the current…
				!1 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
				david-armUnsubmitted Done Reply Inline Actions nit: I think we can simplify the test here by just branching directly, i.e. br label %for.body without the `icmp`. david-arm: nit: I think we can simplify the test here by just branching directly, i.e. br label %for.
				david-armUnsubmitted Done Reply Inline Actions Do we need this `alloca` and the call to `@foo` at the bottom? I wonder if it may be simpler to pass a `i32 %vla` pointer to the function? I think the main thing we care about is the `alloca` in the loop I think? david-arm:** Do we need this `alloca` and the call to `@foo` at the bottom? I wonder if it may be simpler to…