This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/5
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
ARM/
-
tail-folding-counting-down.ll
-
tail-folding-counting-down.ll

Differential D72324

[LV] Still vectorise when tail-folding can't find a primary inducation variable
ClosedPublic

Authored by SjoerdMeijer on Jan 7 2020, 5:13 AM.

Download Raw Diff

Details

Reviewers

hsaito
fhahn
samparker
dmgreen
dorit

Commits

rG8f1887456ab4: [LV] Still vectorise when tail-folding can't find a primary inducation variable

Summary

This addresses a vectorisation regression for tail-folded loops that are counting down, e.g. loops as simple as this:

void foo(char *A, char *B, char *C, uint32_t N) {
  while (N > 0) {
    *C++ = *A++ + *B++;
     N--;
  }
}

These are loops that can be vectorised, but when tail-folding is requested, it can't find a primary induction variable which we do need for predicating the loop. As a result, the loop isn't vectorised at all, which it is able to do when tail-folding is not attempted. So, this adds a check for the primary induction variable where we decide how to lower the scalar epilogue. I.e., when there isn't a primary induction variable, a scalar epilogue loop is allowed (i.e. don't request tail-folding) so that vectorisation could still be triggered.

Having this check for the primary induction variable make sense anyway, and in addition, in a follow-up of this I will look into discovering earlier the primary induction variable for counting down loops, so that this can also be tail-folded.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

SjoerdMeijer created this revision.Jan 7 2020, 5:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2020, 5:13 AM

Herald added subscribers: rkruppe, hiraditya. · View Herald Transcript

samparker added inline comments.Jan 8 2020, 12:39 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7516	The ordering of the predicates tests seems to be a bit off from a readability point-of-view. If we're only thinking about predication, I would expect an early return if PredicateOptDisabled, which would also include Hints.getPredicate() == LoopVectorizeHints::FK_Disable. The last piece of logic would then only contain all the values that we require to enable the folding.
7652–7653	nit: why not just pass as reference?

Thanks for looking at this! And also for encouraging me to look at this (my own) spaghetti logic again. But to be fair, we have quite a few factors that play a role here: optimising for minsize takes precedence over the prefer predicate options, which take precedence over the loop hints, which take precedence over the TTI hook. I have explained this in the comments, and have reshuffled the logic accordingly. I am now bailing earlier on PredicateOptDisabled, as you suggested, loop hints need to be checked lastly, and thus this addresses your comments, I think.

samparker added inline comments.Jan 8 2020, 7:26 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7538	If a hint is provided to disable folding, then we shouldn't even look at any of the Prefer stuff, right? So PredicateOptDisabled = (PreferPredicateOverEpilog.getNumOccurrences() && !PreferPredicateOverEpilog) \|\| Hints.getPredicate() == LoopVectorizeHints::FK_Disabled)

SjoerdMeijer marked an inline comment as done.Jan 8 2020, 8:13 AM

SjoerdMeijer added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7538	We have these test cases: Transforms/LoopVectorize/ARM/tail-loop-folding.ll Transforms/LoopVectorize/X86/tail_loop_folding.ll That have functions with loop hint `predicate.enable=false` and also option `-prefer-predicate-over-epilog` set. The expected output (in these tests) is that this will enable predication, and thus the option overrides the loop hint. That's why I didn't move the loophint check to `PredicateOptDisabled`.

LGTM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7538	Bah, I've missed some brackets - sorry!

This revision is now accepted and ready to land.Jan 9 2020, 12:44 AM

Closed by commit rG8f1887456ab4: [LV] Still vectorise when tail-folding can't find a primary inducation variable (authored by SjoerdMeijer). · Explain WhyJan 9 2020, 1:21 AM

This revision was automatically updated to reflect the committed changes.

Ayal mentioned this in D77635: [LV] Vectorize with FoldTail when Primary Induction is absent.Apr 7 2020, 2:31 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

63 lines

test/

Transforms/

LoopVectorize/

ARM/

tail-folding-counting-down.ll

47 lines

tail-folding-counting-down.ll

42 lines

Diff 236978

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,496 Lines • ▼ Show 20 Lines	if (!Mask)
return State.ILV->vectorizeMemoryInstruction(&Instr);		return State.ILV->vectorizeMemoryInstruction(&Instr);

InnerLoopVectorizer::VectorParts MaskValues(State.UF);		InnerLoopVectorizer::VectorParts MaskValues(State.UF);
for (unsigned Part = 0; Part < State.UF; ++Part)		for (unsigned Part = 0; Part < State.UF; ++Part)
MaskValues[Part] = State.get(Mask, Part);		MaskValues[Part] = State.get(Mask, Part);
State.ILV->vectorizeMemoryInstruction(&Instr, &MaskValues);		State.ILV->vectorizeMemoryInstruction(&Instr, &MaskValues);
}		}

static ScalarEpilogueLowering		// Determine how to lower the scalar epilogue, which depends on 1) optimising
getScalarEpilogueLowering(Function F, Loop L, LoopVectorizeHints &Hints,		// for minimum code-size, 2) predicate compiler options, 3) loop hints forcing
ProfileSummaryInfo PSI, BlockFrequencyInfo BFI,		// predication, and 4) a TTI hook that analyses whether the loop is suitable
TargetTransformInfo TTI, TargetLibraryInfo TLI,		// for predication.
AssumptionCache AC, LoopInfo LI,		static ScalarEpilogueLowering getScalarEpilogueLowering(
ScalarEvolution SE, DominatorTree DT,		Function F, Loop L, LoopVectorizeHints &Hints, ProfileSummaryInfo *PSI,
const LoopAccessInfo *LAI) {		BlockFrequencyInfo BFI, TargetTransformInfo TTI, TargetLibraryInfo *TLI,
ScalarEpilogueLowering SEL = CM_ScalarEpilogueAllowed;		AssumptionCache AC, LoopInfo LI, ScalarEvolution SE, DominatorTree DT,
		LoopVectorizationLegality &LVL) {
		bool OptSize =
		F->hasOptSize() \|\| llvm::shouldOptimizeForSize(L->getHeader(), PSI, BFI,
		PGSOQueryType::IRPass);
		samparkerUnsubmitted Not Done Reply Inline Actions The ordering of the predicates tests seems to be a bit off from a readability point-of-view. If we're only thinking about predication, I would expect an early return if PredicateOptDisabled, which would also include Hints.getPredicate() == LoopVectorizeHints::FK_Disable. The last piece of logic would then only contain all the values that we require to enable the folding. samparker: The ordering of the predicates tests seems to be a bit off from a readability point-of-view. If…
		// 1) OptSize takes precedence over all other options, i.e. if this is set,
		// don't look at hints or options, and don't request a scalar epilogue.
		if (OptSize && Hints.getForce() != LoopVectorizeHints::FK_Enabled)
		return CM_ScalarEpilogueNotAllowedOptSize;

bool PredicateOptDisabled = PreferPredicateOverEpilog.getNumOccurrences() &&		bool PredicateOptDisabled = PreferPredicateOverEpilog.getNumOccurrences() &&
!PreferPredicateOverEpilog;		!PreferPredicateOverEpilog;

if (Hints.getForce() != LoopVectorizeHints::FK_Enabled &&		// 2) Next, if disabling predication is requested on the command line, honour
(F->hasOptSize() \|\|		// this and request a scalar epilogue. Also do this if we don't have a
llvm::shouldOptimizeForSize(L->getHeader(), PSI, BFI,		// primary induction variable, which is required for predication.
PGSOQueryType::IRPass)))		if (PredicateOptDisabled \|\| !LVL.getPrimaryInduction())
SEL = CM_ScalarEpilogueNotAllowedOptSize;		return CM_ScalarEpilogueAllowed;
else if (PreferPredicateOverEpilog \|\|
		// 3) and 4) look if enabling predication is requested on the command line,
		// with a loop hint, or if the TTI hook indicates this is profitable, request
		// predication .
		if (PreferPredicateOverEpilog \|\|
Hints.getPredicate() == LoopVectorizeHints::FK_Enabled \|\|		Hints.getPredicate() == LoopVectorizeHints::FK_Enabled \|\|
(TTI->preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI) &&		(TTI->preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT,
Hints.getPredicate() != LoopVectorizeHints::FK_Disabled &&		LVL.getLAI()) &&
!PredicateOptDisabled))		Hints.getPredicate() != LoopVectorizeHints::FK_Disabled))
		samparkerUnsubmitted Not Done Reply Inline Actions If a hint is provided to disable folding, then we shouldn't even look at any of the Prefer stuff, right? So PredicateOptDisabled = (PreferPredicateOverEpilog.getNumOccurrences() && !PreferPredicateOverEpilog) \|\| Hints.getPredicate() == LoopVectorizeHints::FK_Disabled) samparker: If a hint is provided to disable folding, then we shouldn't even look at any of the Prefer…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions We have these test cases: Transforms/LoopVectorize/ARM/tail-loop-folding.ll Transforms/LoopVectorize/X86/tail_loop_folding.ll That have functions with loop hint `predicate.enable=false` and also option `-prefer-predicate-over-epilog` set. The expected output (in these tests) is that this will enable predication, and thus the option overrides the loop hint. That's why I didn't move the loophint check to `PredicateOptDisabled`. SjoerdMeijer: We have these test cases: Transforms/LoopVectorize/ARM/tail-loop-folding.ll…
		samparkerUnsubmitted Not Done Reply Inline Actions Bah, I've missed some brackets - sorry! samparker: Bah, I've missed some brackets - sorry!
SEL = CM_ScalarEpilogueNotNeededUsePredicate;		return CM_ScalarEpilogueNotNeededUsePredicate;

return SEL;		return CM_ScalarEpilogueAllowed;
}		}

// Process the loop in the VPlan-native vectorization path. This path builds		// Process the loop in the VPlan-native vectorization path. This path builds
// VPlan upfront in the vectorization pipeline, which allows to apply		// VPlan upfront in the vectorization pipeline, which allows to apply
// VPlan-to-VPlan transformations from the very beginning without modifying the		// VPlan-to-VPlan transformations from the very beginning without modifying the
// input LLVM IR.		// input LLVM IR.
static bool processLoopInVPlanNativePath(		static bool processLoopInVPlanNativePath(
Loop L, PredicatedScalarEvolution &PSE, LoopInfo LI, DominatorTree *DT,		Loop L, PredicatedScalarEvolution &PSE, LoopInfo LI, DominatorTree *DT,
LoopVectorizationLegality LVL, TargetTransformInfo TTI,		LoopVectorizationLegality LVL, TargetTransformInfo TTI,
TargetLibraryInfo TLI, DemandedBits DB, AssumptionCache *AC,		TargetLibraryInfo TLI, DemandedBits DB, AssumptionCache *AC,
OptimizationRemarkEmitter ORE, BlockFrequencyInfo BFI,		OptimizationRemarkEmitter ORE, BlockFrequencyInfo BFI,
ProfileSummaryInfo *PSI, LoopVectorizeHints &Hints) {		ProfileSummaryInfo *PSI, LoopVectorizeHints &Hints) {

assert(EnableVPlanNativePath && "VPlan-native path is disabled.");		assert(EnableVPlanNativePath && "VPlan-native path is disabled.");
Function *F = L->getHeader()->getParent();		Function *F = L->getHeader()->getParent();
InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL->getLAI());		InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL->getLAI());

ScalarEpilogueLowering SEL =		ScalarEpilogueLowering SEL = getScalarEpilogueLowering(
getScalarEpilogueLowering(F, L, Hints, PSI, BFI, TTI, TLI, AC, LI,		F, L, Hints, PSI, BFI, TTI, TLI, AC, LI, PSE.getSE(), DT, *LVL);
PSE.getSE(), DT, LVL->getLAI());

LoopVectorizationCostModel CM(SEL, L, PSE, LI, LVL, *TTI, TLI, DB, AC, ORE, F,		LoopVectorizationCostModel CM(SEL, L, PSE, LI, LVL, *TTI, TLI, DB, AC, ORE, F,
&Hints, IAI);		&Hints, IAI);
// Use the planner for outer loop vectorization.		// Use the planner for outer loop vectorization.
// TODO: CM is not used at this point inside the planner. Turn CM into an		// TODO: CM is not used at this point inside the planner. Turn CM into an
// optional argument if we don't need it in the future.		// optional argument if we don't need it in the future.
LoopVectorizationPlanner LVP(L, LI, TLI, TTI, LVL, CM, IAI);		LoopVectorizationPlanner LVP(L, LI, TLI, TTI, LVL, CM, IAI);

▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
if (!LVL.canVectorize(EnableVPlanNativePath)) {		if (!LVL.canVectorize(EnableVPlanNativePath)) {
LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");		LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");
Hints.emitRemarkWithHints();		Hints.emitRemarkWithHints();
return false;		return false;
}		}

// Check the function attributes and profiles to find out if this function		// Check the function attributes and profiles to find out if this function
// should be optimized for size.		// should be optimized for size.
ScalarEpilogueLowering SEL =		ScalarEpilogueLowering SEL = getScalarEpilogueLowering(
getScalarEpilogueLowering(F, L, Hints, PSI, BFI, TTI, TLI, AC, LI,		F, L, Hints, PSI, BFI, TTI, TLI, AC, LI, PSE.getSE(), DT, LVL);
		samparkerUnsubmitted Not Done Reply Inline Actions nit: why not just pass as reference? samparker: nit: why not just pass as reference?
PSE.getSE(), DT, LVL.getLAI());

// Entrance to the VPlan-native vectorization path. Outer loops are processed		// Entrance to the VPlan-native vectorization path. Outer loops are processed
// here. They may require CFG and instruction level transformations before		// here. They may require CFG and instruction level transformations before
// even evaluating whether vectorization is profitable. Since we cannot modify		// even evaluating whether vectorization is profitable. Since we cannot modify
// the incoming IR, we need to build VPlan upfront in the vectorization		// the incoming IR, we need to build VPlan upfront in the vectorization
// pipeline.		// pipeline.
if (!L->empty())		if (!L->empty())
return processLoopInVPlanNativePath(L, PSE, LI, DT, &LVL, TTI, TLI, DB, AC,		return processLoopInVPlanNativePath(L, PSE, LI, DT, &LVL, TTI, TLI, DB, AC,
▲ Show 20 Lines • Show All 341 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/tail-folding-counting-down.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -S \| FileCheck %s
				; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilog -S \| FileCheck %s
				; RUN: opt < %s -loop-vectorize -disable-mve-tail-predication=false -S \| FileCheck %s

				; Check that when we can't predicate this loop that it is still vectorised (with
				; an epilogue).
				; TODO: the reason this can't be predicated is because a primary induction
				; variable can't be found (not yet) for this counting down loop. But with that
				; fixed, this should be able to be predicated.

				; CHECK-LABEL: vector.body:

				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.1m.main-arm-unknown-eabihf"

				define dso_local void @foo(i8* noalias nocapture readonly %A, i8* noalias nocapture readonly %B, i8* noalias nocapture %C, i32 %N) #0 {
				entry:
				%cmp6 = icmp eq i32 %N, 0
				br i1 %cmp6, label %while.end, label %while.body.preheader

				while.body.preheader:
				br label %while.body

				while.body:
				%N.addr.010 = phi i32 [ %dec, %while.body ], [ %N, %while.body.preheader ]
				%C.addr.09 = phi i8* [ %incdec.ptr4, %while.body ], [ %C, %while.body.preheader ]
				%B.addr.08 = phi i8* [ %incdec.ptr1, %while.body ], [ %B, %while.body.preheader ]
				%A.addr.07 = phi i8* [ %incdec.ptr, %while.body ], [ %A, %while.body.preheader ]
				%incdec.ptr = getelementptr inbounds i8, i8* %A.addr.07, i32 1
				%0 = load i8, i8* %A.addr.07, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %B.addr.08, i32 1
				%1 = load i8, i8* %B.addr.08, align 1
				%add = add i8 %1, %0
				%incdec.ptr4 = getelementptr inbounds i8, i8* %C.addr.09, i32 1
				store i8 %add, i8* %C.addr.09, align 1
				%dec = add i32 %N.addr.010, -1
				%cmp = icmp eq i32 %dec, 0
				br i1 %cmp, label %while.end.loopexit, label %while.body

				while.end.loopexit:
				br label %while.end

				while.end:
				ret void
				}

				attributes #0 = { nofree norecurse nounwind "target-features"="+armv8.1-m.main,+mve.fp" }

llvm/test/Transforms/LoopVectorize/tail-folding-counting-down.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilog -S \| FileCheck %s

				; Check that when we can't predicate this loop that it is still vectorised (with
				; an epilogue).
				; TODO: the reason this can't be predicated is because a primary induction
				; variable can't be found (not yet) for this counting down loop. But with that
				; fixed, this should be able to be predicated.

				; CHECK-LABEL: vector.body:

				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

				define dso_local void @foo(i8* noalias nocapture readonly %A, i8* noalias nocapture readonly %B, i8* noalias nocapture %C, i32 %N) {
				entry:
				%cmp6 = icmp eq i32 %N, 0
				br i1 %cmp6, label %while.end, label %while.body.preheader

				while.body.preheader:
				br label %while.body

				while.body:
				%N.addr.010 = phi i32 [ %dec, %while.body ], [ %N, %while.body.preheader ]
				%C.addr.09 = phi i8* [ %incdec.ptr4, %while.body ], [ %C, %while.body.preheader ]
				%B.addr.08 = phi i8* [ %incdec.ptr1, %while.body ], [ %B, %while.body.preheader ]
				%A.addr.07 = phi i8* [ %incdec.ptr, %while.body ], [ %A, %while.body.preheader ]
				%incdec.ptr = getelementptr inbounds i8, i8* %A.addr.07, i32 1
				%0 = load i8, i8* %A.addr.07, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %B.addr.08, i32 1
				%1 = load i8, i8* %B.addr.08, align 1
				%add = add i8 %1, %0
				%incdec.ptr4 = getelementptr inbounds i8, i8* %C.addr.09, i32 1
				store i8 %add, i8* %C.addr.09, align 1
				%dec = add i32 %N.addr.010, -1
				%cmp = icmp eq i32 %dec, 0
				br i1 %cmp, label %while.end.loopexit, label %while.body

				while.end.loopexit:
				br label %while.end

				while.end:
				ret void
				}