This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/5
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
ARM/
-
tail-folding-counting-down.ll
-
tail-folding-counting-down.ll

Differential D72324

[LV] Still vectorise when tail-folding can't find a primary inducation variable
ClosedPublic

Authored by SjoerdMeijer on Jan 7 2020, 5:13 AM.

Download Raw Diff

Details

Reviewers

hsaito
fhahn
samparker
dmgreen
dorit

Commits

rG8f1887456ab4: [LV] Still vectorise when tail-folding can't find a primary inducation variable

Summary

This addresses a vectorisation regression for tail-folded loops that are counting down, e.g. loops as simple as this:

void foo(char *A, char *B, char *C, uint32_t N) {
  while (N > 0) {
    *C++ = *A++ + *B++;
     N--;
  }
}

These are loops that can be vectorised, but when tail-folding is requested, it can't find a primary induction variable which we do need for predicating the loop. As a result, the loop isn't vectorised at all, which it is able to do when tail-folding is not attempted. So, this adds a check for the primary induction variable where we decide how to lower the scalar epilogue. I.e., when there isn't a primary induction variable, a scalar epilogue loop is allowed (i.e. don't request tail-folding) so that vectorisation could still be triggered.

Having this check for the primary induction variable make sense anyway, and in addition, in a follow-up of this I will look into discovering earlier the primary induction variable for counting down loops, so that this can also be tail-folded.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Jan 7 2020, 5:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2020, 5:13 AM

Herald added subscribers: rkruppe, hiraditya. · View Herald Transcript

samparker added inline comments.Jan 8 2020, 12:39 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7525	The ordering of the predicates tests seems to be a bit off from a readability point-of-view. If we're only thinking about predication, I would expect an early return if PredicateOptDisabled, which would also include Hints.getPredicate() == LoopVectorizeHints::FK_Disable. The last piece of logic would then only contain all the values that we require to enable the folding.
7652	nit: why not just pass as reference?

Thanks for looking at this! And also for encouraging me to look at this (my own) spaghetti logic again. But to be fair, we have quite a few factors that play a role here: optimising for minsize takes precedence over the prefer predicate options, which take precedence over the loop hints, which take precedence over the TTI hook. I have explained this in the comments, and have reshuffled the logic accordingly. I am now bailing earlier on PredicateOptDisabled, as you suggested, loop hints need to be checked lastly, and thus this addresses your comments, I think.

samparker added inline comments.Jan 8 2020, 7:26 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7549	If a hint is provided to disable folding, then we shouldn't even look at any of the Prefer stuff, right? So PredicateOptDisabled = (PreferPredicateOverEpilog.getNumOccurrences() && !PreferPredicateOverEpilog) \|\| Hints.getPredicate() == LoopVectorizeHints::FK_Disabled)

SjoerdMeijer marked an inline comment as done.Jan 8 2020, 8:13 AM

SjoerdMeijer added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7549	We have these test cases: Transforms/LoopVectorize/ARM/tail-loop-folding.ll Transforms/LoopVectorize/X86/tail_loop_folding.ll That have functions with loop hint `predicate.enable=false` and also option `-prefer-predicate-over-epilog` set. The expected output (in these tests) is that this will enable predication, and thus the option overrides the loop hint. That's why I didn't move the loophint check to `PredicateOptDisabled`.

LGTM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7549	Bah, I've missed some brackets - sorry!

This revision is now accepted and ready to land.Jan 9 2020, 12:44 AM

Closed by commit rG8f1887456ab4: [LV] Still vectorise when tail-folding can't find a primary inducation variable (authored by SjoerdMeijer). · Explain WhyJan 9 2020, 1:21 AM

This revision was automatically updated to reflect the committed changes.

Ayal mentioned this in D77635: [LV] Vectorize with FoldTail when Primary Induction is absent.Apr 7 2020, 2:31 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

38 lines

test/

Transforms/

LoopVectorize/

ARM/

tail-folding-counting-down.ll

47 lines

tail-folding-counting-down.ll

42 lines

Diff 236555

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,496 Lines • ▼ Show 20 Lines	if (!Mask)
return State.ILV->vectorizeMemoryInstruction(&Instr);		return State.ILV->vectorizeMemoryInstruction(&Instr);

InnerLoopVectorizer::VectorParts MaskValues(State.UF);		InnerLoopVectorizer::VectorParts MaskValues(State.UF);
for (unsigned Part = 0; Part < State.UF; ++Part)		for (unsigned Part = 0; Part < State.UF; ++Part)
MaskValues[Part] = State.get(Mask, Part);		MaskValues[Part] = State.get(Mask, Part);
State.ILV->vectorizeMemoryInstruction(&Instr, &MaskValues);		State.ILV->vectorizeMemoryInstruction(&Instr, &MaskValues);
}		}

		// Determine how to lower the scalar epilogue, which depends if we optimise
		// for minimum code-size, if options or loop hints forcing predication are set,
		// and a TTI hook that analyses whether the loop is suitable for predication.
static ScalarEpilogueLowering		static ScalarEpilogueLowering
getScalarEpilogueLowering(Function F, Loop L, LoopVectorizeHints &Hints,		getScalarEpilogueLowering(Function F, Loop L, LoopVectorizeHints &Hints,
ProfileSummaryInfo PSI, BlockFrequencyInfo BFI,		ProfileSummaryInfo PSI, BlockFrequencyInfo BFI,
TargetTransformInfo TTI, TargetLibraryInfo TLI,		TargetTransformInfo TTI, TargetLibraryInfo TLI,
AssumptionCache AC, LoopInfo LI,		AssumptionCache AC, LoopInfo LI,
ScalarEvolution SE, DominatorTree DT,		ScalarEvolution SE, DominatorTree DT,
const LoopAccessInfo *LAI) {		LoopVectorizationLegality *LVL) {
ScalarEpilogueLowering SEL = CM_ScalarEpilogueAllowed;		bool OptSize = F->hasOptSize() \|\|
		llvm::shouldOptimizeForSize(L->getHeader(), PSI, BFI,
		PGSOQueryType::IRPass);
		if (Hints.getForce() != LoopVectorizeHints::FK_Enabled && OptSize)
		return CM_ScalarEpilogueNotAllowedOptSize;

		// If we don't have a primary induction variable, don't try to predicate the
		// vector body because for this an induction variable is required.
		// Vectorisation would fail, which is not what we want if the loop could be
		// vectorised with a scalar epilogue.
		if (!LVL->getPrimaryInduction())
		samparkerUnsubmitted Not Done Reply Inline Actions The ordering of the predicates tests seems to be a bit off from a readability point-of-view. If we're only thinking about predication, I would expect an early return if PredicateOptDisabled, which would also include Hints.getPredicate() == LoopVectorizeHints::FK_Disable. The last piece of logic would then only contain all the values that we require to enable the folding. samparker: The ordering of the predicates tests seems to be a bit off from a readability point-of-view. If…
		return CM_ScalarEpilogueAllowed;

bool PredicateOptDisabled = PreferPredicateOverEpilog.getNumOccurrences() &&		bool PredicateOptDisabled = PreferPredicateOverEpilog.getNumOccurrences() &&
!PreferPredicateOverEpilog;		!PreferPredicateOverEpilog;
		if (PreferPredicateOverEpilog \|\|
if (Hints.getForce() != LoopVectorizeHints::FK_Enabled &&
(F->hasOptSize() \|\|
llvm::shouldOptimizeForSize(L->getHeader(), PSI, BFI,
PGSOQueryType::IRPass)))
SEL = CM_ScalarEpilogueNotAllowedOptSize;
else if (PreferPredicateOverEpilog \|\|
Hints.getPredicate() == LoopVectorizeHints::FK_Enabled \|\|		Hints.getPredicate() == LoopVectorizeHints::FK_Enabled \|\|
(TTI->preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT, LAI) &&		(TTI->preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT,
		LVL->getLAI()) &&
Hints.getPredicate() != LoopVectorizeHints::FK_Disabled &&		Hints.getPredicate() != LoopVectorizeHints::FK_Disabled &&
!PredicateOptDisabled))		!PredicateOptDisabled))
SEL = CM_ScalarEpilogueNotNeededUsePredicate;		return CM_ScalarEpilogueNotNeededUsePredicate;

return SEL;		return CM_ScalarEpilogueAllowed;
}		}

// Process the loop in the VPlan-native vectorization path. This path builds		// Process the loop in the VPlan-native vectorization path. This path builds
// VPlan upfront in the vectorization pipeline, which allows to apply		// VPlan upfront in the vectorization pipeline, which allows to apply
// VPlan-to-VPlan transformations from the very beginning without modifying the		// VPlan-to-VPlan transformations from the very beginning without modifying the
// input LLVM IR.		// input LLVM IR.
static bool processLoopInVPlanNativePath(		static bool processLoopInVPlanNativePath(
Loop L, PredicatedScalarEvolution &PSE, LoopInfo LI, DominatorTree *DT,		Loop L, PredicatedScalarEvolution &PSE, LoopInfo LI, DominatorTree *DT,
LoopVectorizationLegality LVL, TargetTransformInfo TTI,		LoopVectorizationLegality LVL, TargetTransformInfo TTI,
TargetLibraryInfo TLI, DemandedBits DB, AssumptionCache *AC,		TargetLibraryInfo TLI, DemandedBits DB, AssumptionCache *AC,
OptimizationRemarkEmitter ORE, BlockFrequencyInfo BFI,		OptimizationRemarkEmitter ORE, BlockFrequencyInfo BFI,
		samparkerUnsubmitted Not Done Reply Inline Actions If a hint is provided to disable folding, then we shouldn't even look at any of the Prefer stuff, right? So PredicateOptDisabled = (PreferPredicateOverEpilog.getNumOccurrences() && !PreferPredicateOverEpilog) \|\| Hints.getPredicate() == LoopVectorizeHints::FK_Disabled) samparker: If a hint is provided to disable folding, then we shouldn't even look at any of the Prefer…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions We have these test cases: Transforms/LoopVectorize/ARM/tail-loop-folding.ll Transforms/LoopVectorize/X86/tail_loop_folding.ll That have functions with loop hint `predicate.enable=false` and also option `-prefer-predicate-over-epilog` set. The expected output (in these tests) is that this will enable predication, and thus the option overrides the loop hint. That's why I didn't move the loophint check to `PredicateOptDisabled`. SjoerdMeijer: We have these test cases: Transforms/LoopVectorize/ARM/tail-loop-folding.ll…
		samparkerUnsubmitted Not Done Reply Inline Actions Bah, I've missed some brackets - sorry! samparker: Bah, I've missed some brackets - sorry!
ProfileSummaryInfo *PSI, LoopVectorizeHints &Hints) {		ProfileSummaryInfo *PSI, LoopVectorizeHints &Hints) {

assert(EnableVPlanNativePath && "VPlan-native path is disabled.");		assert(EnableVPlanNativePath && "VPlan-native path is disabled.");
Function *F = L->getHeader()->getParent();		Function *F = L->getHeader()->getParent();
InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL->getLAI());		InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL->getLAI());

ScalarEpilogueLowering SEL =		ScalarEpilogueLowering SEL =
getScalarEpilogueLowering(F, L, Hints, PSI, BFI, TTI, TLI, AC, LI,		getScalarEpilogueLowering(F, L, Hints, PSI, BFI, TTI, TLI, AC, LI,
PSE.getSE(), DT, LVL->getLAI());		PSE.getSE(), DT, LVL);

LoopVectorizationCostModel CM(SEL, L, PSE, LI, LVL, *TTI, TLI, DB, AC, ORE, F,		LoopVectorizationCostModel CM(SEL, L, PSE, LI, LVL, *TTI, TLI, DB, AC, ORE, F,
&Hints, IAI);		&Hints, IAI);
// Use the planner for outer loop vectorization.		// Use the planner for outer loop vectorization.
// TODO: CM is not used at this point inside the planner. Turn CM into an		// TODO: CM is not used at this point inside the planner. Turn CM into an
// optional argument if we don't need it in the future.		// optional argument if we don't need it in the future.
LoopVectorizationPlanner LVP(L, LI, TLI, TTI, LVL, CM, IAI);		LoopVectorizationPlanner LVP(L, LI, TLI, TTI, LVL, CM, IAI);

▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	if (!LVL.canVectorize(EnableVPlanNativePath)) {
Hints.emitRemarkWithHints();		Hints.emitRemarkWithHints();
return false;		return false;
}		}

// Check the function attributes and profiles to find out if this function		// Check the function attributes and profiles to find out if this function
// should be optimized for size.		// should be optimized for size.
ScalarEpilogueLowering SEL =		ScalarEpilogueLowering SEL =
getScalarEpilogueLowering(F, L, Hints, PSI, BFI, TTI, TLI, AC, LI,		getScalarEpilogueLowering(F, L, Hints, PSI, BFI, TTI, TLI, AC, LI,
PSE.getSE(), DT, LVL.getLAI());		PSE.getSE(), DT, &LVL);
		samparkerUnsubmitted Not Done Reply Inline Actions nit: why not just pass as reference? samparker: nit: why not just pass as reference?

// Entrance to the VPlan-native vectorization path. Outer loops are processed		// Entrance to the VPlan-native vectorization path. Outer loops are processed
// here. They may require CFG and instruction level transformations before		// here. They may require CFG and instruction level transformations before
// even evaluating whether vectorization is profitable. Since we cannot modify		// even evaluating whether vectorization is profitable. Since we cannot modify
// the incoming IR, we need to build VPlan upfront in the vectorization		// the incoming IR, we need to build VPlan upfront in the vectorization
// pipeline.		// pipeline.
if (!L->empty())		if (!L->empty())
return processLoopInVPlanNativePath(L, PSE, LI, DT, &LVL, TTI, TLI, DB, AC,		return processLoopInVPlanNativePath(L, PSE, LI, DT, &LVL, TTI, TLI, DB, AC,
▲ Show 20 Lines • Show All 341 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/tail-folding-counting-down.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -S \| FileCheck %s
				; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilog -S \| FileCheck %s
				; RUN: opt < %s -loop-vectorize -disable-mve-tail-predication=false -S \| FileCheck %s

				; Check that when we can't predicate this loop that it is still vectorised (with
				; an epilogue).
				; TODO: the reason this can't be predicated is because a primary induction
				; variable can't be found (not yet) for this counting down loop. But with that
				; fixed, this should be able to be predicated.

				; CHECK-LABEL: vector.body:

				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv8.1m.main-arm-unknown-eabihf"

				define dso_local void @foo(i8* noalias nocapture readonly %A, i8* noalias nocapture readonly %B, i8* noalias nocapture %C, i32 %N) #0 {
				entry:
				%cmp6 = icmp eq i32 %N, 0
				br i1 %cmp6, label %while.end, label %while.body.preheader

				while.body.preheader:
				br label %while.body

				while.body:
				%N.addr.010 = phi i32 [ %dec, %while.body ], [ %N, %while.body.preheader ]
				%C.addr.09 = phi i8* [ %incdec.ptr4, %while.body ], [ %C, %while.body.preheader ]
				%B.addr.08 = phi i8* [ %incdec.ptr1, %while.body ], [ %B, %while.body.preheader ]
				%A.addr.07 = phi i8* [ %incdec.ptr, %while.body ], [ %A, %while.body.preheader ]
				%incdec.ptr = getelementptr inbounds i8, i8* %A.addr.07, i32 1
				%0 = load i8, i8* %A.addr.07, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %B.addr.08, i32 1
				%1 = load i8, i8* %B.addr.08, align 1
				%add = add i8 %1, %0
				%incdec.ptr4 = getelementptr inbounds i8, i8* %C.addr.09, i32 1
				store i8 %add, i8* %C.addr.09, align 1
				%dec = add i32 %N.addr.010, -1
				%cmp = icmp eq i32 %dec, 0
				br i1 %cmp, label %while.end.loopexit, label %while.body

				while.end.loopexit:
				br label %while.end

				while.end:
				ret void
				}

				attributes #0 = { nofree norecurse nounwind "target-features"="+armv8.1-m.main,+mve.fp" }

llvm/test/Transforms/LoopVectorize/tail-folding-counting-down.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilog -S \| FileCheck %s

				; Check that when we can't predicate this loop that it is still vectorised (with
				; an epilogue).
				; TODO: the reason this can't be predicated is because a primary induction
				; variable can't be found (not yet) for this counting down loop. But with that
				; fixed, this should be able to be predicated.

				; CHECK-LABEL: vector.body:

				target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

				define dso_local void @foo(i8* noalias nocapture readonly %A, i8* noalias nocapture readonly %B, i8* noalias nocapture %C, i32 %N) {
				entry:
				%cmp6 = icmp eq i32 %N, 0
				br i1 %cmp6, label %while.end, label %while.body.preheader

				while.body.preheader:
				br label %while.body

				while.body:
				%N.addr.010 = phi i32 [ %dec, %while.body ], [ %N, %while.body.preheader ]
				%C.addr.09 = phi i8* [ %incdec.ptr4, %while.body ], [ %C, %while.body.preheader ]
				%B.addr.08 = phi i8* [ %incdec.ptr1, %while.body ], [ %B, %while.body.preheader ]
				%A.addr.07 = phi i8* [ %incdec.ptr, %while.body ], [ %A, %while.body.preheader ]
				%incdec.ptr = getelementptr inbounds i8, i8* %A.addr.07, i32 1
				%0 = load i8, i8* %A.addr.07, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %B.addr.08, i32 1
				%1 = load i8, i8* %B.addr.08, align 1
				%add = add i8 %1, %0
				%incdec.ptr4 = getelementptr inbounds i8, i8* %C.addr.09, i32 1
				store i8 %add, i8* %C.addr.09, align 1
				%dec = add i32 %N.addr.010, -1
				%cmp = icmp eq i32 %dec, 0
				br i1 %cmp, label %while.end.loopexit, label %while.body

				while.end.loopexit:
				br label %while.end

				while.end:
				ret void
				}