Download Raw Diff

Details

Reviewers

efriedma
david-arm
sdesmalen
gilr
fhahn

Commits

rGcf06c8eee3a5: [LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer…

Summary

The function fixReduction used to assert/crash for scalable vector when
a vector reduce could be done with a smaller vector.
This patch removes this assertion as it is safe to use scalable vector for
vector reduce and truncate.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	300 ms	x64 windows > lld.MachO::reproduce.s

Event Timeline

CarolineConcatto created this revision.Apr 25 2021, 10:29 AM

Herald added a reviewer: efriedma. · View Herald TranscriptApr 25 2021, 10:29 AM

Herald added subscribers: psnobl, hiraditya, tschuett. · View Herald Transcript

CarolineConcatto requested review of this revision.Apr 25 2021, 10:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2021, 10:29 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B100819: Diff 340369.Apr 25 2021, 11:14 AM

CarolineConcatto added reviewers: david-arm, sdesmalen, gilr, fhahn.Apr 26 2021, 1:14 AM

The fix looks good thanks! Just a minor comment about the test ...

llvm/test/Transforms/LoopVectorize/sve-reduction-inloop.ll
13	Hi @CarolineConcatto, I presume this is a truncation of the PHI value? If possible I think it might be nice to have CHECK lines for the PHI values to see them being truncated. I think that ZEXT1 and ZEXT2 will also be the incoming values for the PHI node too.

Address review's comment about the check in the llvm-ir test

CarolineConcatto added inline comments.Apr 26 2021, 9:34 AM

llvm/test/Transforms/LoopVectorize/sve-reduction-inloop.ll
13	Hey @david-arm, I hope I've addressed your comment. I've run: ../llvm/utils/update_test_checks.py --opt=./bin/opt ../llvm/test/Transforms/LoopVectorize/sve-reduction-inloop.ll and removed some checks that I thought it was not needed. But I can let the entire output of update_test_checks.py if you find it better.

Harbormaster completed remote builds in B100963: Diff 340561.Apr 26 2021, 11:24 AM

LGTM! Thanks for updating the tests - I think it's a stronger test now with the PHIs and makes it clearer what's going on. :)

llvm/test/Transforms/LoopVectorize/sve-reduction-inloop.ll
10	nit: Maybe here and in the PHI below it's good to show where TMP34 and TMP36 come from too, i.e. instead of `{{%.*}}` you can just write `vector.body`.
14	nit: It's up to you, but if you prefer a smaller set of CHECK lines you can probably kill off lines TMP21 to TMP25 and just have simple CHECK lines for the loads, i.e. ; CHECK: [[WIDE_LOAD:%.]] = load <vscale x 8 x i8>, <vscale x 8 x i8> and lower down ; CHECK; [[WIDE_LOAD2:%.]] = load <vscale x 8 x i8>, <vscale x 8 x i8>

This revision is now accepted and ready to land.Apr 27 2021, 12:15 AM

address reviewer's comment about the test

Hi @david-arm,
I have addressed your comments.
I also moved the test to AArch64 folder. I believe is the correct folder for it.
Carol

fhahn added inline comments.Apr 30 2021, 6:38 AM

llvm/test/Transforms/LoopVectorize/AArch64/sve-reduction-inloop.ll
2 ↗	(On Diff #341875)	Can this test be target independent by using `-force-target-supports-scalable-vectors` or does it contain any AArch64 cost-modeling?
41 ↗	(On Diff #341875)	might be helpful to use a nicer name for the loop block to make the test a bit easier to parse, maybe `%loop`? Same for `._crit_edge`, perhaps `%exit`?

LGTM! Thanks for making the changes. :)

llvm/test/Transforms/LoopVectorize/AArch64/sve-reduction-inloop.ll
15 ↗	(On Diff #341875)	nit: I think you can also just remove lines TMP22-24 as they're unused now.

Harbormaster completed remote builds in B101909: Diff 341875.Apr 30 2021, 7:20 AM

-Update the test to use -force-target-supports-scalable-vectors
-Move and rename test to scalable-reduction-inloop.ll
-Remove unnecessary checks

llvm/test/Transforms/LoopVectorize/AArch64/sve-reduction-inloop.ll
2 ↗	(On Diff #341875)	Thank you @fhahn for that. I don't see direct dependency in this test and the fix with the cost model, but I could be wrong. I've added the flag as you suggested and it works fine. So I've moved the test to be outside AArch64. Is that ok?
15 ↗	(On Diff #341875)	Thank you @david-arm, for some reason before this test was failing without these lines, but now it is passing. Probably I was doing something wrong.
41 ↗	(On Diff #341875)	Hey @fhahn thank you for your input, I have changed .lr.ph to loop and ._crit_edge to exit. But just for reference, this test has the same llvm-ir as in reduction-inloop.ll but now using scalable vectors instead of fixed vector.

Harbormaster completed remote builds in B101955: Diff 341943.Apr 30 2021, 11:13 AM

LGTM! Thanks for making the changes - seems like all review comments have been addressed. :)

Thanks for the updates! LGTM

llvm/test/Transforms/LoopVectorize/AArch64/sve-reduction-inloop.ll
41 ↗	(On Diff #341875)	Thanks for improving it!

Closed by commit rGcf06c8eee3a5: [LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer… (authored by CarolineConcatto). · Explain WhyMay 7 2021, 1:39 AM

This revision was automatically updated to reflect the committed changes.

CarolineConcatto added a commit: rGcf06c8eee3a5: [LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer….

Diff 340561

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,344 Lines • ▼ Show 20 Lines	if (Cost->foldTailByMasking() && !IsInLoopReductionPhi) {
}		}
}		}

// If the vector reduction can be performed in a smaller type, we truncate		// If the vector reduction can be performed in a smaller type, we truncate
// then extend the loop exit value to enable InstCombine to evaluate the		// then extend the loop exit value to enable InstCombine to evaluate the
// entire expression in the smaller type.		// entire expression in the smaller type.
if (VF.isVector() && PhiTy != RdxDesc.getRecurrenceType()) {		if (VF.isVector() && PhiTy != RdxDesc.getRecurrenceType()) {
assert(!IsInLoopReductionPhi && "Unexpected truncated inloop reduction!");		assert(!IsInLoopReductionPhi && "Unexpected truncated inloop reduction!");
assert(!VF.isScalable() && "scalable vectors not yet supported.");
Type *RdxVecTy = VectorType::get(RdxDesc.getRecurrenceType(), VF);		Type *RdxVecTy = VectorType::get(RdxDesc.getRecurrenceType(), VF);
Builder.SetInsertPoint(		Builder.SetInsertPoint(
LI->getLoopFor(LoopVectorBody)->getLoopLatch()->getTerminator());		LI->getLoopFor(LoopVectorBody)->getLoopLatch()->getTerminator());
VectorParts RdxParts(UF);		VectorParts RdxParts(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
RdxParts[Part] = State.get(LoopExitInstDef, Part);		RdxParts[Part] = State.get(LoopExitInstDef, Part);
Value *Trunc = Builder.CreateTrunc(RdxParts[Part], RdxVecTy);		Value *Trunc = Builder.CreateTrunc(RdxParts[Part], RdxVecTy);
Value *Extnd = RdxDesc.isSigned() ? Builder.CreateSExt(Trunc, VecTy)		Value *Extnd = RdxDesc.isSigned() ? Builder.CreateSExt(Trunc, VecTy)
▲ Show 20 Lines • Show All 5,734 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/sve-reduction-inloop.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -S \| FileCheck %s

				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

				define i8 @reduction_add_trunc(i8* noalias nocapture %A) {
				; CHECK-LABEL: @reduction_add_trunc(
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, {{%.}} ], [ [[INDEX_NEXT:%.]], {{%.}} ]
				; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 8 x i32> [ insertelement (<vscale x 8 x i32> zeroinitializer, i32 255, i32 0), {{%.}} ], [ [[TMP34:%.]], {{%.}} ]
				david-armUnsubmitted Done Reply Inline Actions nit: Maybe here and in the PHI below it's good to show where TMP34 and TMP36 come from too, i.e. instead of `{{%.}}` you can just write `vector.body`. david-arm:* nit: Maybe here and in the PHI below it's good to show where TMP34 and TMP36 come from too, i.e.
				; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <vscale x 8 x i32> [ zeroinitializer, {{%.}} ], [ [[TMP36:%.]], {{%.}} ]
				; CHECK: [[TMP14:%.*]] = and <vscale x 8 x i32> [[VEC_PHI]], shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 255, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP15:%.*]] = and <vscale x 8 x i32> [[VEC_PHI1]], shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 255, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				david-armUnsubmitted Not Done Reply Inline Actions Hi @CarolineConcatto, I presume this is a truncation of the PHI value? If possible I think it might be nice to have CHECK lines for the PHI values to see them being truncated. I think that ZEXT1 and ZEXT2 will also be the incoming values for the PHI node too. david-arm: Hi @CarolineConcatto, I presume this is a truncation of the PHI value? If possible I think it…
				CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Hey @david-arm, I hope I've addressed your comment. I've run: ../llvm/utils/update_test_checks.py --opt=./bin/opt ../llvm/test/Transforms/LoopVectorize/sve-reduction-inloop.ll and removed some checks that I thought it was not needed. But I can let the entire output of update_test_checks.py if you find it better. CarolineConcatto: Hey @david-arm, I hope I've addressed your comment. I've run: ../llvm/utils/update_test_checks.
				; CHECK: [[TMP21:%.]] = bitcast i8 {{%.}} to <vscale x 8 x i8>
				david-armUnsubmitted Done Reply Inline Actions nit: It's up to you, but if you prefer a smaller set of CHECK lines you can probably kill off lines TMP21 to TMP25 and just have simple CHECK lines for the loads, i.e. ; CHECK: [[WIDE_LOAD:%.]] = load <vscale x 8 x i8>, <vscale x 8 x i8> and lower down ; CHECK; [[WIDE_LOAD2:%.]] = load <vscale x 8 x i8>, <vscale x 8 x i8> david-arm: nit: It's up to you, but if you prefer a smaller set of CHECK lines you can probably kill off…
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 8 x i8>, <vscale x 8 x i8> [[TMP21]], align 4
				; CHECK-NEXT: [[TMP22:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP23:%.*]] = mul i32 [[TMP22]], 8
				; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds i8, i8 {{%.*}}, i32 [[TMP23]]
				; CHECK: [[TMP25:%.]] = bitcast i8 [[TMP24]] to <vscale x 8 x i8>*
				; CHECK-NEXT: [[WIDE_LOAD2:%.]] = load <vscale x 8 x i8>, <vscale x 8 x i8> [[TMP25]], align 4
				; CHECK-NEXT: [[TMP26:%.*]] = zext <vscale x 8 x i8> [[WIDE_LOAD]] to <vscale x 8 x i32>
				; CHECK-NEXT: [[TMP27:%.*]] = zext <vscale x 8 x i8> [[WIDE_LOAD2]] to <vscale x 8 x i32>
				; CHECK-NEXT: [[TMP28:%.*]] = add <vscale x 8 x i32> [[TMP14]], [[TMP26]]
				; CHECK-NEXT: [[TMP29:%.*]] = add <vscale x 8 x i32> [[TMP15]], [[TMP27]]
				; CHECK-NEXT: [[TMP30:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP31:%.*]] = mul i32 [[TMP30]], 16
				; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], [[TMP31]]
				; CHECK-NEXT: [[TMP32:%.]] = icmp eq i32 [[INDEX_NEXT]], {{%.}}
				; CHECK-NEXT: [[TMP33:%.*]] = trunc <vscale x 8 x i32> [[TMP28]] to <vscale x 8 x i8>
				; CHECK-NEXT: [[TMP34]] = zext <vscale x 8 x i8> [[TMP33]] to <vscale x 8 x i32>
				; CHECK-NEXT: [[TMP35:%.*]] = trunc <vscale x 8 x i32> [[TMP29]] to <vscale x 8 x i8>
				; CHECK-NEXT: [[TMP36]] = zext <vscale x 8 x i8> [[TMP35]] to <vscale x 8 x i32>
				; CHECK: middle.block:
				; CHECK-NEXT: [[TMP37:%.*]] = trunc <vscale x 8 x i32> [[TMP34]] to <vscale x 8 x i8>
				; CHECK-NEXT: [[TMP38:%.*]] = trunc <vscale x 8 x i32> [[TMP36]] to <vscale x 8 x i8>
				; CHECK-NEXT: [[BIN_RDX:%.*]] = add <vscale x 8 x i8> [[TMP38]], [[TMP37]]
				; CHECK-NEXT: [[TMP39:%.*]] = call i8 @llvm.vector.reduce.add.nxv8i8(<vscale x 8 x i8> [[BIN_RDX]])
				; CHECK-NEXT: [[TMP40:%.*]] = zext i8 [[TMP39]] to i32
				;
				entry:
				br label %.lr.ph

				.lr.ph: ; preds = %entry, %.lr.ph
				%indvars.iv = phi i32 [ %indvars.iv.next, %.lr.ph ], [ 0, %entry ]
				%sum.02p = phi i32 [ %l9, %.lr.ph ], [ 255, %entry ]
				%sum.02 = and i32 %sum.02p, 255
				%l2 = getelementptr inbounds i8, i8* %A, i32 %indvars.iv
				%l3 = load i8, i8* %l2, align 4
				%l3e = zext i8 %l3 to i32
				%l9 = add i32 %sum.02, %l3e
				%indvars.iv.next = add i32 %indvars.iv, 1
				%exitcond = icmp eq i32 %indvars.iv.next, 256
				br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0

				._crit_edge: ; preds = %.lr.ph
				%sum.0.lcssa = phi i32 [ %l9, %.lr.ph ]
				%ret = trunc i32 %sum.0.lcssa to i8
				ret i8 %ret
				}

				!0 = distinct !{!0, !1, !2, !3, !4}
				!1 = !{!"llvm.loop.vectorize.width", i32 8}
				!2 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
				!3 = !{!"llvm.loop.interleave.count", i32 2}
				!4 = !{!"llvm.loop.vectorize.enable", i1 true}

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 340561

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/sve-reduction-inloop.ll

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReductionClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 340561

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/sve-reduction-inloop.ll

[LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction
ClosedPublic