This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/4
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
avoid-needless-fold-tail.ll

Differential D93615

[LV] Avoid needless fold tail
ClosedPublic

Authored by gilr on Dec 20 2020, 11:29 PM.

Download Raw Diff

Details

Reviewers

fhahn
Ayal
SjoerdMeijer

Commits

rGa56280094e08: [LV] Avoid needless fold tail

Summary

When the trip-count is provably divisible by the maximal/chosen VF, folding the loop's tail during vectorization is redundant.
This commit extends the existing test for constant trip-counts to any trip-count known to be divisible by maximal/selected VF by SCEV.

Diff Detail

Event Timeline

gilr created this revision.Dec 20 2020, 11:29 PM

Herald added subscribers: javed.absar, hiraditya. · View Herald TranscriptDec 20 2020, 11:29 PM

gilr requested review of this revision.Dec 20 2020, 11:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2020, 11:29 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

gilr added a reviewer: SjoerdMeijer.Dec 21 2020, 12:18 AM

Harbormaster completed remote builds in B83103: Diff 313023.Dec 21 2020, 12:18 AM

SjoerdMeijer added inline comments.Dec 21 2020, 12:43 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5505	Was surprised to see this change, because I thought we were handling it here already. Is this check here still relevant? Or can we "merge" this with the one that you added below?

gilr added inline comments.Dec 21 2020, 5:33 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5505	Is this check here still relevant? Yes, since IC may take non-power-of-2 values. Will add a test to cover that.

Add a test for a constant TC with IC=3.

SjoerdMeijer added inline comments.Dec 21 2020, 5:42 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5505	Ah yeah, I do see that now. Thanks

LGTM

This revision is now accepted and ready to land.Dec 21 2020, 6:46 AM

LGTM, thanks

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll
10 ↗	(On Diff #313091)	Is this enough? I think it might be better to check the whole vector body?

gilr added inline comments.Dec 22 2020, 12:02 AM

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll
10 ↗	(On Diff #313091)	Tried to keep the checks to the minimum proving unmasked vectorization, but perhaps indeed better to check the whole context - will switch to update_test format for easy maintenance. Thanks @SjoerdMeijer, @fhahn!

Closed by commit rGa56280094e08: [LV] Avoid needless fold tail (authored by gilr). · Explain WhyDec 22 2020, 12:26 AM

This revision was automatically updated to reflect the committed changes.

gilr added a commit: rGa56280094e08: [LV] Avoid needless fold tail.

fhahn added inline comments.Dec 22 2020, 1:41 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5505	Thinking a bit more about this, I think we should be able to use `ScalarEvolution::getURemExpr` to check if the trip count is a multiple of any VF. That should work for both the constant and variable trip-count cases. As I missed commenting on that before the patch landed, I put up D93677

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

13 lines

test/

Transforms/

LoopVectorize/

avoid-needless-fold-tail.ll

25 lines

Diff 313023

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,496 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
}		}

assert(!MaxVF.isScalable() &&		assert(!MaxVF.isScalable() &&
"Scalable vectors do not yet support tail folding");		"Scalable vectors do not yet support tail folding");
assert((UserVF.isNonZero() \|\| isPowerOf2_32(MaxVF.getFixedValue())) &&		assert((UserVF.isNonZero() \|\| isPowerOf2_32(MaxVF.getFixedValue())) &&
"MaxVF must be a power of 2");		"MaxVF must be a power of 2");
unsigned MaxVFtimesIC =		unsigned MaxVFtimesIC =
UserIC ? MaxVF.getFixedValue() * UserIC : MaxVF.getFixedValue();		UserIC ? MaxVF.getFixedValue() * UserIC : MaxVF.getFixedValue();
if (TC > 0 && TC % MaxVFtimesIC == 0) {		if (TC > 0 && TC % MaxVFtimesIC == 0) {
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Was surprised to see this change, because I thought we were handling it here already. Is this check here still relevant? Or can we "merge" this with the one that you added below? SjoerdMeijer: Was surprised to see this change, because I thought we were handling it here already. Is this…
		gilrAuthorUnsubmitted Done Reply Inline Actions Is this check here still relevant? Yes, since IC may take non-power-of-2 values. Will add a test to cover that. gilr: >Is this check here still relevant? Yes, since IC may take non-power-of-2 values. Will add a…
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Ah yeah, I do see that now. Thanks SjoerdMeijer: Ah yeah, I do see that now. Thanks
		fhahnUnsubmitted Not Done Reply Inline Actions Thinking a bit more about this, I think we should be able to use `ScalarEvolution::getURemExpr` to check if the trip count is a multiple of any VF. That should work for both the constant and variable trip-count cases. As I missed commenting on that before the patch landed, I put up D93677 fhahn: Thinking a bit more about this, I think we should be able to use `ScalarEvolution::getURemExpr`…
// Accept MaxVF if we do not have a tail.		// Accept MaxVF if we do not have a tail.
LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");		LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");
return MaxVF;		return MaxVF;
}		}

		// Avoid tail folding if the trip count is known to be a multiple of any VF we
		// chose.
		ScalarEvolution *SE = PSE.getSE();
		const SCEV *BackedgeTakenCount = PSE.getBackedgeTakenCount();
		const SCEV *ExitCount = SE->getAddExpr(
		BackedgeTakenCount, SE->getOne(BackedgeTakenCount->getType()));
		unsigned TCisMultipleOf = 1 << SE->GetMinTrailingZeros(ExitCount);
		if (TCisMultipleOf % MaxVFtimesIC == 0) {
		// Accept MaxVF if we do not have a tail.
		LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");
		return MaxVF;
		}

// If we don't know the precise trip count, or if the trip count that we		// If we don't know the precise trip count, or if the trip count that we
// found modulo the vectorization factor is not zero, try to fold the tail		// found modulo the vectorization factor is not zero, try to fold the tail
// by masking.		// by masking.
// FIXME: look for a smaller MaxVF that does divide TC rather than masking.		// FIXME: look for a smaller MaxVF that does divide TC rather than masking.
if (Legal->prepareToFoldTailByMasking()) {		if (Legal->prepareToFoldTailByMasking()) {
FoldTailByMasking = true;		FoldTailByMasking = true;
return MaxVF;		return MaxVF;
}		}
▲ Show 20 Lines • Show All 3,895 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/avoid-needless-fold-tail.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -force-vector-width=4 -S \| FileCheck %s

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				; Make sure the loop is vectorized under -Os without folding its tail, based on
				; the trip-count's lower bits being zero.
				; CHECK: vector.body:
				; CHECK: store <4 x i32>

				define dso_local void @alignTC(i32* noalias nocapture %A, i32 %n) optsize {
				entry:
				%alignedTC = and i32 %n, -8
				br label %loop

				loop:
				%riv = phi i32 [ 0, %entry ], [ %rivPlus1, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %A, i32 %riv
				store i32 13, i32* %arrayidx, align 1
				%rivPlus1 = add nuw nsw i32 %riv, 1
				%cond = icmp eq i32 %rivPlus1, %alignedTC
				br i1 %cond, label %exit, label %loop

				exit:
				ret void
				}