Download Raw Diff

Details

Reviewers

sdesmalen
efriedma
kmclaughlin
david-arm
fhahn

Commits

rGa36d269658df: [VPlan] Avoid collecting scalars for SVE

Summary

This patch ensures scalars (except for uniforms) are no
longer collected (prior to LVP planning phase) for
scalable vectorization.

This is to avoid the chances of generating scalarized
instructions later (during LVP execute phase) as they
are not supported for scalable vectorization.

Relevant test has also been added.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

malharJ created this revision.Mar 11 2022, 2:00 AM

Herald added a reviewer: efriedma. · View Herald TranscriptMar 11 2022, 2:00 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: ctetreau, tschuett, psnobl and 3 others. · View Herald Transcript

malharJ requested review of this revision.Mar 11 2022, 2:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2022, 2:00 AM

Herald added subscribers: llvm-commits, alextsao1999, vkmr. · View Herald Transcript

sdesmalen added reviewers: kmclaughlin, david-arm, fhahn.Mar 11 2022, 2:32 AM

Hi @malharJ, thanks for this fix! I believe this is doing the right thing, because for scalable vectors an instruction should either be considered uniform after vectorization, or it shouldn't be considered scalar.

Can you remove any references to SVE and instead replace it with scalable vectors in the commit message and title. This patch isn't specific to AArch64 SVE, but rather scalable vectors in general.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4458–4460	nit: `s/during planning [...] for SVE./for scalable vectors./`

Harbormaster completed remote builds in B153742: Diff 414617.Mar 11 2022, 2:51 AM

Updated terminology to use scalable vectors instead of SVE

malharJ marked an inline comment as done.Mar 11 2022, 3:42 AM

malharJ retitled this revision from [SVE][VPlan] Avoid collecting scalars for SVE to [VPlan] Avoid collecting scalars for SVE.

malharJ edited the summary of this revision. (Show Details)

malharJ retitled this revision from [VPlan] Avoid collecting scalars for SVE to [VPlan] Avoid scalarization for scalable vectors..Mar 11 2022, 3:44 AM

fhahn added inline comments.Mar 11 2022, 3:45 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
6	Does this fail without the change or does this need `2>&1`?
20	could use some better names
24	Is the bit cast needed or could this just load `i64`? It would also be good to add make sure the load is not dead in the loop.

sdesmalen added inline comments.Mar 11 2022, 3:51 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
6	I'd actually prefer this test to check the output of LoopVectorize so that we can make sure the output is as expected, as opposed to checking for the absence of a failure. @malharJ You can probably use the update_test_checks script to generate CHECK lines for the output.

malharJ added inline comments.Mar 11 2022, 3:56 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
6	this test will fail without this change .. I dont think 2>&1 is needed.
24	the bitcast is actually needed. It results in the gep not being classified as 'uniform' which is required for this test case to generate the assert failure (without this patch). (the way the `collectLoopUniforms()` logic works is that it looks at the load/store and places it's pointer operand (which is the bitcast here) into a worklist. It then iterates over the bitcast's operands (which in this case is the gep) and checks for uniformity. And since the gep has one usage outside the loop, it is marked as not uniform. This consequently results in the loop induction update variable being marked as not uniform, but as a scalar (REPLICATE recipe) causing the assertion failure) And regarding the dead load, I was just trying to create a minimal testcase.

Harbormaster completed remote builds in B153750: Diff 414630.Mar 11 2022, 4:42 AM

Updated LIT test to use autogenerated CHECKs using update_test_checks.py

Harbormaster completed remote builds in B154208: Diff 415257.Mar 14 2022, 5:13 PM

malharJ marked an inline comment as done.Mar 15 2022, 4:52 AM

sdesmalen added inline comments.Mar 15 2022, 5:36 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
9	nit: s/it (and consequently [...] variable) from being classified as 'uniform'./ the gep and consequently the loop induction update variable from being classified as 'uniform'/
43	It looks like this loop has been vectorized with an Interleave Factor of 2. Can you limit that using `-force-vector-interleave=1`, to reduce the number of CHECK lines?
101–103	nit: this alloca and the subsequent load/store seem unnecessary. Maybe you can just pass in some `i32 %N` as function argument for the number of iterations.
108	Can you make a loop that increments instead of decrements? That avoids the calls to `@llvm.experimental.vector.reverse.nxv2f64` and makes the CHECK lines a bit simpler.

sdesmalen mentioned this in D121690: [VPlan] Don't collect some values as scalars.Mar 15 2022, 5:47 AM

addressed review comments to reduce size of test output

Harbormaster completed remote builds in B154590: Diff 415812.Mar 16 2022, 6:50 AM

Thanks for the fix @malharJ, I'm happy with the patch now.

This revision is now accepted and ready to land.Mar 16 2022, 6:53 AM

fhahn added inline comments.Mar 16 2022, 6:55 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
6	this test will fail without this change .. I dont think 2>&1 is needed. Oh right, it originally failed due to the crash, not the mismatch. The current checks should be sufficient now.
81	Might be good to clean up the basic block names here a bit.

This revision was landed with ongoing or failed builds.Mar 16 2022, 9:34 AM

Closed by commit rGa36d269658df: [VPlan] Avoid collecting scalars for SVE (authored by malharJ). · Explain Why

This revision was automatically updated to reflect the committed changes.

malharJ marked 2 inline comments as done.

malharJ added a commit: rGa36d269658df: [VPlan] Avoid collecting scalars for SVE.

malharJ added inline comments.Mar 16 2022, 9:34 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
20	Done as part of final commit.
81	Done as part of final commit.

Diff 415869

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,449 Lines • ▼ Show 20 Lines

	void LoopVectorizationCostModel::collectLoopScalars(ElementCount VF) {			void LoopVectorizationCostModel::collectLoopScalars(ElementCount VF) {
	// We should not collect Scalars more than once per VF. Right now, this			// We should not collect Scalars more than once per VF. Right now, this
	// function is called from collectUniformsAndScalars(), which already does			// function is called from collectUniformsAndScalars(), which already does
	// this check. Collecting Scalars for VF=1 does not make any sense.			// this check. Collecting Scalars for VF=1 does not make any sense.
	assert(VF.isVector() && Scalars.find(VF) == Scalars.end() &&			assert(VF.isVector() && Scalars.find(VF) == Scalars.end() &&
	"This function should not be visited twice for the same VF");			"This function should not be visited twice for the same VF");

				// This avoids any chances of creating a REPLICATE recipe during planning
				// since that would result in generation of scalarized code during execution,
				// which is not supported for scalable vectors.
				sdesmalenUnsubmitted Done Reply Inline Actions nit: `s/during planning [...] for SVE./for scalable vectors./` sdesmalen: nit: `s/during planning [...] for SVE./for scalable vectors./`
				if (VF.isScalable()) {
				Scalars[VF].insert(Uniforms[VF].begin(), Uniforms[VF].end());
				return;
				}

	SmallSetVector<Instruction *, 8> Worklist;			SmallSetVector<Instruction *, 8> Worklist;

	// These sets are used to seed the analysis with pointers used by memory			// These sets are used to seed the analysis with pointers used by memory
	// accesses that will remain scalar.			// accesses that will remain scalar.
	SmallSetVector<Instruction *, 8> ScalarPtrs;			SmallSetVector<Instruction *, 8> ScalarPtrs;
	SmallPtrSet<Instruction *, 8> PossibleNonScalarPtrs;			SmallPtrSet<Instruction *, 8> PossibleNonScalarPtrs;
	auto *Latch = TheLoop->getLoopLatch();			auto *Latch = TheLoop->getLoopLatch();

	▲ Show 20 Lines • Show All 6,321 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -mtriple=aarch64 -loop-vectorize --force-vector-interleave=1 -S \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				; The test checks that scalarized code is not generated for SVE.
				fhahnUnsubmitted Not Done Reply Inline Actions Does this fail without the change or does this need `2>&1`? fhahn: Does this fail without the change or does this need `2>&1`?
				malharJAuthorUnsubmitted Done Reply Inline Actions this test will fail without this change .. I dont think 2>&1 is needed. malharJ: this test will fail without this change .. I dont think 2>&1 is needed.
				sdesmalenUnsubmitted Done Reply Inline Actions I'd actually prefer this test to check the output of LoopVectorize so that we can make sure the output is as expected, as opposed to checking for the absence of a failure. @malharJ You can probably use the update_test_checks script to generate CHECK lines for the output. sdesmalen: I'd actually prefer this test to check the output of LoopVectorize so that we can make sure the…
				fhahnUnsubmitted Not Done Reply Inline Actions this test will fail without this change .. I dont think 2>&1 is needed. Oh right, it originally failed due to the crash, not the mismatch. The current checks should be sufficient now. fhahn: > this test will fail without this change .. I dont think 2>&1 is needed. Oh right, it…
				; It creates a scenario where the gep instruction is used outside
				; the loop, preventing the gep (and consequently the loop induction
				; update variable) from being classified as 'uniform'.
				sdesmalenUnsubmitted Done Reply Inline Actions nit: s/it (and consequently [...] variable) from being classified as 'uniform'./ the gep and consequently the loop induction update variable from being classified as 'uniform'/ sdesmalen: nit: s/it (and consequently [...] variable) from being classified as 'uniform'./ the…

				define void @test_no_scalarization(i64* %a, i32 %idx, i32 %n) #0 {
				; CHECK-LABEL: @test_no_scalarization(
				; CHECK-NEXT: L.entry:
				; CHECK-NEXT: [[TMP0:%.]] = add i32 [[IDX:%.]], 1
				; CHECK-NEXT: [[SMAX:%.]] = call i32 @llvm.smax.i32(i32 [[N:%.]], i32 [[TMP0]])
				; CHECK-NEXT: [[TMP1:%.*]] = sub i32 [[SMAX]], [[IDX]]
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP3:%.*]] = mul i32 [[TMP2]], 2
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP1]], [[TMP3]]
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				fhahnUnsubmitted Done Reply Inline Actions could use some better names fhahn: could use some better names
				malharJAuthorUnsubmitted Done Reply Inline Actions Done as part of final commit. malharJ: Done as part of final commit.
				; CHECK: vector.ph:
				; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP5:%.*]] = mul i32 [[TMP4]], 2
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP1]], [[TMP5]]
				fhahnUnsubmitted Not Done Reply Inline Actions Is the bit cast needed or could this just load `i64`? It would also be good to add make sure the load is not dead in the loop. fhahn: Is the bit cast needed or could this just load `i64`? It would also be good to add make sure…
				malharJAuthorUnsubmitted Done Reply Inline Actions the bitcast is actually needed. It results in the gep not being classified as 'uniform' which is required for this test case to generate the assert failure (without this patch). (the way the `collectLoopUniforms()` logic works is that it looks at the load/store and places it's pointer operand (which is the bitcast here) into a worklist. It then iterates over the bitcast's operands (which in this case is the gep) and checks for uniformity. And since the gep has one usage outside the loop, it is marked as not uniform. This consequently results in the loop induction update variable being marked as not uniform, but as a scalar (REPLICATE recipe) causing the assertion failure) And regarding the dead load, I was just trying to create a minimal testcase. malharJ: the bitcast is actually needed. It results in the gep not being classified as 'uniform' which…
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP1]], [[N_MOD_VF]]
				; CHECK-NEXT: [[IND_END:%.*]] = add i32 [[IDX]], [[N_VEC]]
				; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i32> poison, i32 [[IDX]], i32 0
				; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i32> [[DOTSPLATINSERT]], <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 2 x i32> @llvm.experimental.stepvector.nxv2i32()
				; CHECK-NEXT: [[TMP7:%.*]] = add <vscale x 2 x i32> [[TMP6]], zeroinitializer
				; CHECK-NEXT: [[TMP8:%.*]] = mul <vscale x 2 x i32> [[TMP7]], shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
				; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 2 x i32> [[DOTSPLAT]], [[TMP8]]
				; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP10:%.*]] = mul i32 [[TMP9]], 2
				; CHECK-NEXT: [[TMP11:%.*]] = mul i32 1, [[TMP10]]
				; CHECK-NEXT: [[DOTSPLATINSERT1:%.*]] = insertelement <vscale x 2 x i32> poison, i32 [[TMP11]], i32 0
				; CHECK-NEXT: [[DOTSPLAT2:%.*]] = shufflevector <vscale x 2 x i32> [[DOTSPLATINSERT1]], <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 2 x i32> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr i64, i64 [[A:%.*]], <vscale x 2 x i32> [[VEC_IND]]
				; CHECK-NEXT: [[TMP13:%.]] = extractelement <vscale x 2 x i64> [[TMP12]], i32 0
				sdesmalenUnsubmitted Done Reply Inline Actions It looks like this loop has been vectorized with an Interleave Factor of 2. Can you limit that using `-force-vector-interleave=1`, to reduce the number of CHECK lines? sdesmalen: It looks like this loop has been vectorized with an Interleave Factor of 2. Can you limit that…
				; CHECK-NEXT: [[TMP14:%.]] = bitcast i64 [[TMP13]] to double*
				; CHECK-NEXT: [[TMP15:%.]] = getelementptr double, double [[TMP14]], i32 0
				; CHECK-NEXT: [[TMP16:%.]] = bitcast double [[TMP15]] to <vscale x 2 x double>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 2 x double>, <vscale x 2 x double> [[TMP16]], align 8
				; CHECK-NEXT: [[TMP17:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP18:%.*]] = mul i32 [[TMP17]], 2
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], [[TMP18]]
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 2 x i32> [[VEC_IND]], [[DOTSPLAT2]]
				; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
				; CHECK-NEXT: [[TMP20:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP21:%.*]] = mul i32 [[TMP20]], 2
				; CHECK-NEXT: [[TMP22:%.*]] = sub i32 [[TMP21]], 1
				; CHECK-NEXT: [[TMP23:%.]] = extractelement <vscale x 2 x i64> [[TMP12]], i32 [[TMP22]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[L_EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[IDX]], [[L_ENTRY:%.]] ]
				; CHECK-NEXT: br label [[L_LOOPBODY:%.*]]
				; CHECK: L.LoopBody:
				; CHECK-NEXT: [[INDVAR:%.]] = phi i32 [ [[INDVAR_NEXT:%.]], [[L_LOOPBODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
				; CHECK-NEXT: [[INDVAR_NEXT]] = add nsw i32 [[INDVAR]], 1
				; CHECK-NEXT: [[TMP24:%.]] = getelementptr i64, i64 [[A]], i32 [[INDVAR]]
				; CHECK-NEXT: [[TMP25:%.]] = bitcast i64 [[TMP24]] to double*
				; CHECK-NEXT: [[TMP26:%.]] = load double, double [[TMP25]], align 8
				; CHECK-NEXT: [[TMP27:%.*]] = icmp slt i32 [[INDVAR_NEXT]], [[N]]
				; CHECK-NEXT: br i1 [[TMP27]], label [[L_LOOPBODY]], label [[L_EXIT]], !llvm.loop [[LOOP2:![0-9]+]]
				; CHECK: L.exit:
				; CHECK-NEXT: [[DOTLCSSA:%.]] = phi i64 [ [[TMP24]], [[L_LOOPBODY]] ], [ [[TMP23]], [[MIDDLE_BLOCK]] ]
				; CHECK-NEXT: store i64 1, i64* [[DOTLCSSA]], align 8
				; CHECK-NEXT: ret void
				;
				L.entry:
				br label %L.LoopBody

				L.LoopBody: ; preds = %L.LoopBody, %L.entry
				%indvar = phi i32 [ %indvar.next, %L.LoopBody ], [ %idx, %L.entry ]
				fhahnUnsubmitted Done Reply Inline Actions Might be good to clean up the basic block names here a bit. fhahn: Might be good to clean up the basic block names here a bit.
				malharJAuthorUnsubmitted Done Reply Inline Actions Done as part of final commit. malharJ: Done as part of final commit.
				%indvar.next = add nsw i32 %indvar, 1
				%0 = getelementptr i64, i64* %a, i32 %indvar
				%1 = bitcast i64* %0 to double*
				%2 = load double, double* %1, align 8
				%3 = icmp slt i32 %indvar.next, %n
				br i1 %3, label %L.LoopBody, label %L.exit

				L.exit: ; preds = %L.LoopBody
				store i64 1, i64* %0, align 8
				ret void
				}

				attributes #0 = { nofree norecurse noreturn nosync nounwind "target-features"="+sve" }

				sdesmalenUnsubmitted Done Reply Inline Actions Can you make a loop that increments instead of decrements? That avoids the calls to `@llvm.experimental.vector.reverse.nxv2f64` and makes the CHECK lines a bit simpler. sdesmalen: Can you make a loop that increments instead of decrements? That avoids the calls to `@llvm.
				sdesmalenUnsubmitted Done Reply Inline Actions nit: this alloca and the subsequent load/store seem unnecessary. Maybe you can just pass in some `i32 %N` as function argument for the number of iterations. sdesmalen: nit: this alloca and the subsequent load/store seem unnecessary. Maybe you can just pass in…

This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Avoid scalarization for scalable vectors.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 415869

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll

This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Avoid scalarization for scalable vectors.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 415869

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll

[VPlan] Avoid scalarization for scalable vectors.
ClosedPublic