This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64TargetTransformInfo.h
-
Transforms/Vectorize/
-
Vectorize/
4/8
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
sve-low-trip-count.ll

Differential D130755

[LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding for scalable vectors
ClosedPublic

Authored by dtemirbulatov on Jul 29 2022, 3:38 AM.

Download Raw Diff

Details

Reviewers

david-arm
paulwalker-arm
peterwaller-arm
sdesmalen

Summary

After D121595 was commited, I noticed regressions assosicated with small trip count numbers vectorization by tail folding with scalable vectors. As a solution for those issues I propose to introduce the minimal trip count threshold value.

Diff Detail

Unit TestsFailed

	Time	Test
	1,410 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-finalstats.test

Event Timeline

dtemirbulatov created this revision.Jul 29 2022, 3:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2022, 3:38 AM

Herald added subscribers: shiva0217, frasercrmck, luismarques and 20 others. · View Herald Transcript

dtemirbulatov requested review of this revision.Jul 29 2022, 3:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2022, 3:38 AM

Herald added subscribers: alextsao1999, • pcwang-thead, MaskRay. · View Herald Transcript

c-rhodes added a subscriber: c-rhodes.Jul 29 2022, 3:50 AM

Harbormaster completed remote builds in B178242: Diff 448571.Jul 29 2022, 4:33 AM

paulwalker-arm added inline comments.Jul 29 2022, 5:52 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
192	Typo here, should be TailFolding?

Fixed typo.

dtemirbulatov marked an inline comment as done.Jul 29 2022, 6:42 AM

dtemirbulatov added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
192	Done. thanks Paul.

Harbormaster completed remote builds in B178275: Diff 448616.Jul 29 2022, 7:24 AM

paulwalker-arm added inline comments.Aug 4 2022, 4:27 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
192–195	We're late into the LLVM 15 release cycle so here we're trying to fix an LLVM 14 regression without resorting to reversing the decision to use tail folding for all low trip count loops. I think it'll be better to now do this in a way with the most minimal impact to other targets and so rather than a global command line option can you instead add a TTI hook. That way we can restrict the change to just AArch64/SVE and if it turns out we can rely on a more accurate cost model in the future and there are no other uses at that time, we can remove the TTI hook.
5088–5092	This doesn't look like the correct place to decide this. Within `LoopVectorizePass::processLoop` the code exists to choose `CM_ScalarEpilogueNotAllowedLowTripLoop` for low trip count looks. I think that's the code we want to change to also consider the new tail folding specific watermark.

Matt added a subscriber: Matt.Aug 4 2022, 8:34 AM

Fixed remarks.

Harbormaster completed remote builds in B179501: Diff 450275.Aug 5 2022, 6:59 AM

paulwalker-arm added inline comments.Aug 5 2022, 8:15 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10148–10161	Can this be if (*ExpectedTC > TTI->getMinTripCountTailFoldingThreshold()) SEL = CM_ScalarEpilogueNotAllowedLowTripLoop and still do what you need?

dtemirbulatov marked an inline comment as done.Aug 5 2022, 9:50 AM

dtemirbulatov added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10148–10161	I have to return false in order to prevent vectorization.

paulwalker-arm added inline comments.Aug 7 2022, 1:39 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

10148–10161

Thanks @dtemirbulatov. In which case it looks like you need to restructure the code a little so we emit the necessary remark and debug information. e.g.

if (Hints.getForce() == LoopVectorizeHints::FK_Enabled)
...
else if (*ExpectedTC > TTI->getMinTripCountTailFoldingThreshold()) {
  LLVM_DEBUG(dbgs() << "\n");
  SEL = CM_ScalarEpilogueNotAllowedLowTripLoop;
} else {
  LLVM_DEBUG(dbgs() << " But the target considers the trip count too small to consider vectorizing.\n");
  reportVectorizationFailure(....);
  Hints.emitRemarkWithHints();
  return false;
}

Updating proposed change according to @paulwalker-arm remarks.

A minor pre-push change but otherwise looks good. Once landed can you please do the necessary to request an LLVM 15 backport and then we can turn our attention to potentially fixing up the cost model so this rather brute force fix can be removed.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10152	To make the output consistent please add a <space> before the 'But' because otherwise you'll end up printing `overheads are incurred.But the target considers...`

This revision is now accepted and ready to land.Aug 8 2022, 9:15 AM

Harbormaster completed remote builds in B179914: Diff 450808.Aug 8 2022, 10:07 AM

this was committed with cab6cd68340255be241b7cf169c67a1899ced115
but I made extra spaces before "Differential Revision", that is why it didn't close automatically.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

9 lines

TargetTransformInfoImpl.h

2 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

4 lines

Transforms/

Vectorize/

LoopVectorize.cpp

15 lines

test/

Transforms/

LoopVectorize/

AArch64/

sve-low-trip-count.ll

28 lines

Diff 450808

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 1,459 Lines • ▼ Show 20 Lines
/// \returns True if the target wants to expand the given reduction intrinsic		/// \returns True if the target wants to expand the given reduction intrinsic
/// into a shuffle sequence.		/// into a shuffle sequence.
bool shouldExpandReduction(const IntrinsicInst *II) const;		bool shouldExpandReduction(const IntrinsicInst *II) const;

/// \returns the size cost of rematerializing a GlobalValue address relative		/// \returns the size cost of rematerializing a GlobalValue address relative
/// to a stack reload.		/// to a stack reload.
unsigned getGISelRematGlobalCost() const;		unsigned getGISelRematGlobalCost() const;

		/// \returns the lower bound of a trip count to decide on vectorization
		/// while tail-folding.
		unsigned getMinTripCountTailFoldingThreshold() const;

/// \returns True if the target supports scalable vectors.		/// \returns True if the target supports scalable vectors.
bool supportsScalableVectors() const;		bool supportsScalableVectors() const;

/// \return true when scalable vectorization is preferred.		/// \return true when scalable vectorization is preferred.
bool enableScalableVectorization() const;		bool enableScalableVectorization() const;

/// \name Vector Predication Information		/// \name Vector Predication Information
/// @{		/// @{
▲ Show 20 Lines • Show All 382 Lines • ▼ Show 20 Lines	virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual bool preferInLoopReduction(unsigned Opcode, Type *Ty,		virtual bool preferInLoopReduction(unsigned Opcode, Type *Ty,
ReductionFlags) const = 0;		ReductionFlags) const = 0;
virtual bool preferPredicatedReductionSelect(unsigned Opcode, Type *Ty,		virtual bool preferPredicatedReductionSelect(unsigned Opcode, Type *Ty,
ReductionFlags) const = 0;		ReductionFlags) const = 0;
virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;		virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;
virtual unsigned getGISelRematGlobalCost() const = 0;		virtual unsigned getGISelRematGlobalCost() const = 0;
		virtual unsigned getMinTripCountTailFoldingThreshold() const = 0;
virtual bool enableScalableVectorization() const = 0;		virtual bool enableScalableVectorization() const = 0;
virtual bool supportsScalableVectors() const = 0;		virtual bool supportsScalableVectors() const = 0;
virtual bool hasActiveVectorLength(unsigned Opcode, Type *DataType,		virtual bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
Align Alignment) const = 0;		Align Alignment) const = 0;
virtual InstructionCost getInstructionLatency(const Instruction *I) = 0;		virtual InstructionCost getInstructionLatency(const Instruction *I) = 0;
virtual VPLegalization		virtual VPLegalization
getVPLegalizationStrategy(const VPIntrinsic &PI) const = 0;		getVPLegalizationStrategy(const VPIntrinsic &PI) const = 0;
};		};
▲ Show 20 Lines • Show All 623 Lines • ▼ Show 20 Lines	public:
bool shouldExpandReduction(const IntrinsicInst *II) const override {		bool shouldExpandReduction(const IntrinsicInst *II) const override {
return Impl.shouldExpandReduction(II);		return Impl.shouldExpandReduction(II);
}		}

unsigned getGISelRematGlobalCost() const override {		unsigned getGISelRematGlobalCost() const override {
return Impl.getGISelRematGlobalCost();		return Impl.getGISelRematGlobalCost();
}		}

		unsigned getMinTripCountTailFoldingThreshold() const override {
		return Impl.getMinTripCountTailFoldingThreshold();
		}

bool supportsScalableVectors() const override {		bool supportsScalableVectors() const override {
return Impl.supportsScalableVectors();		return Impl.supportsScalableVectors();
}		}

bool enableScalableVectorization() const override {		bool enableScalableVectorization() const override {
return Impl.enableScalableVectorization();		return Impl.enableScalableVectorization();
}		}

▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 811 Lines • ▼ Show 20 Lines	bool preferPredicatedReductionSelect(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const {		TTI::ReductionFlags Flags) const {
return false;		return false;
}		}

bool shouldExpandReduction(const IntrinsicInst *II) const { return true; }		bool shouldExpandReduction(const IntrinsicInst *II) const { return true; }

unsigned getGISelRematGlobalCost() const { return 1; }		unsigned getGISelRematGlobalCost() const { return 1; }

		unsigned getMinTripCountTailFoldingThreshold() const { return 0; }

bool supportsScalableVectors() const { return false; }		bool supportsScalableVectors() const { return false; }

bool enableScalableVectorization() const { return false; }		bool enableScalableVectorization() const { return false; }

bool hasActiveVectorLength(unsigned Opcode, Type *DataType,		bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
Align Alignment) const {		Align Alignment) const {
return false;		return false;
}		}
▲ Show 20 Lines • Show All 459 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 1,126 Lines • ▼ Show 20 Lines
	bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {			bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {
	return TTIImpl->shouldExpandReduction(II);			return TTIImpl->shouldExpandReduction(II);
	}			}

	unsigned TargetTransformInfo::getGISelRematGlobalCost() const {			unsigned TargetTransformInfo::getGISelRematGlobalCost() const {
	return TTIImpl->getGISelRematGlobalCost();			return TTIImpl->getGISelRematGlobalCost();
	}			}

				unsigned TargetTransformInfo::getMinTripCountTailFoldingThreshold() const {
				return TTIImpl->getMinTripCountTailFoldingThreshold();
				}

	bool TargetTransformInfo::supportsScalableVectors() const {			bool TargetTransformInfo::supportsScalableVectors() const {
	return TTIImpl->supportsScalableVectors();			return TTIImpl->supportsScalableVectors();
	}			}

	bool TargetTransformInfo::enableScalableVectorization() const {			bool TargetTransformInfo::enableScalableVectorization() const {
	return TTIImpl->enableScalableVectorization();			return TTIImpl->enableScalableVectorization();
	}			}

	▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	shouldConsiderAddressTypePromotion(const Instruction &I,
bool &AllowPromotionWithoutCommonHeader);		bool &AllowPromotionWithoutCommonHeader);

bool shouldExpandReduction(const IntrinsicInst *II) const { return false; }		bool shouldExpandReduction(const IntrinsicInst *II) const { return false; }

unsigned getGISelRematGlobalCost() const {		unsigned getGISelRematGlobalCost() const {
return 2;		return 2;
}		}

		unsigned getMinTripCountTailFoldingThreshold() const {
		return ST->hasSVE() ? 5 : 0;
		}

PredicationStyle emitGetActiveLaneMask() const {		PredicationStyle emitGetActiveLaneMask() const {
if (ST->hasSVE())		if (ST->hasSVE())
return PredicationStyle::DataAndControlFlow;		return PredicationStyle::DataAndControlFlow;
return PredicationStyle::None;		return PredicationStyle::None;
}		}

bool preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,		bool preferPredicateOverEpilogue(Loop L, LoopInfo LI, ScalarEvolution &SE,
AssumptionCache &AC, TargetLibraryInfo *TLI,		AssumptionCache &AC, TargetLibraryInfo *TLI,
Show All 30 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	cl::desc("When epilogue vectorization is enabled, and a value greater than "
"1 is specified, forces the given VF for all applicable epilogue "		"1 is specified, forces the given VF for all applicable epilogue "
"loops."));		"loops."));

static cl::opt<unsigned> EpilogueVectorizationMinVF(		static cl::opt<unsigned> EpilogueVectorizationMinVF(
"epilogue-vectorization-minimum-VF", cl::init(16), cl::Hidden,		"epilogue-vectorization-minimum-VF", cl::init(16), cl::Hidden,
cl::desc("Only loops with vectorization factor equal to or larger than "		cl::desc("Only loops with vectorization factor equal to or larger than "
"the specified value are considered for epilogue vectorization."));		"the specified value are considered for epilogue vectorization."));

/// Loops with a known constant trip count below this number are vectorized only		/// Loops with a known constant trip count below this number are vectorized only
		paulwalker-armUnsubmitted Done Reply Inline Actions Typo here, should be TailFolding? paulwalker-arm: Typo here, should be TailFolding?
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions Done. thanks Paul. dtemirbulatov: Done. thanks Paul.
/// if no scalar iteration overheads are incurred.		/// if no scalar iteration overheads are incurred.
static cl::opt<unsigned> TinyTripCountVectorThreshold(		static cl::opt<unsigned> TinyTripCountVectorThreshold(
"vectorizer-min-trip-count", cl::init(16), cl::Hidden,		"vectorizer-min-trip-count", cl::init(16), cl::Hidden,
		paulwalker-armUnsubmitted Not Done Reply Inline Actions We're late into the LLVM 15 release cycle so here we're trying to fix an LLVM 14 regression without resorting to reversing the decision to use tail folding for all low trip count loops. I think it'll be better to now do this in a way with the most minimal impact to other targets and so rather than a global command line option can you instead add a TTI hook. That way we can restrict the change to just AArch64/SVE and if it turns out we can rely on a more accurate cost model in the future and there are no other uses at that time, we can remove the TTI hook. paulwalker-arm: We're late into the LLVM 15 release cycle so here we're trying to fix an LLVM 14 regression…
cl::desc("Loops with a constant trip count that is smaller than this "		cl::desc("Loops with a constant trip count that is smaller than this "
"value are vectorized only if no scalar iteration overheads "		"value are vectorized only if no scalar iteration overheads "
"are incurred."));		"are incurred."));

static cl::opt<unsigned> VectorizeMemoryCheckThreshold(		static cl::opt<unsigned> VectorizeMemoryCheckThreshold(
"vectorize-memory-check-threshold", cl::init(128), cl::Hidden,		"vectorize-memory-check-threshold", cl::init(128), cl::Hidden,
cl::desc("The maximum allowed number of runtime memory checks"));		cl::desc("The maximum allowed number of runtime memory checks"));

▲ Show 20 Lines • Show All 4,876 Lines • ▼ Show 20 Lines	const SCEV *Rem = SE->getURemExpr(
SE->getConstant(BackedgeTakenCount->getType(), MaxVFtimesIC));		SE->getConstant(BackedgeTakenCount->getType(), MaxVFtimesIC));
if (Rem->isZero()) {		if (Rem->isZero()) {
// Accept MaxFixedVF if we do not have a tail.		// Accept MaxFixedVF if we do not have a tail.
LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");		LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");
return MaxFactors;		return MaxFactors;
}		}
}		}

// If we don't know the precise trip count, or if the trip count that we		// If we don't know the precise trip count, or if the trip count that we
// found modulo the vectorization factor is not zero, try to fold the tail		// found modulo the vectorization factor is not zero, try to fold the tail
// by masking.		// by masking.
// FIXME: look for a smaller MaxVF that does divide TC rather than masking.		// FIXME: look for a smaller MaxVF that does divide TC rather than masking.
if (Legal->prepareToFoldTailByMasking()) {		if (Legal->prepareToFoldTailByMasking()) {
		paulwalker-armUnsubmitted Not Done Reply Inline Actions This doesn't look like the correct place to decide this. Within `LoopVectorizePass::processLoop` the code exists to choose `CM_ScalarEpilogueNotAllowedLowTripLoop` for low trip count looks. I think that's the code we want to change to also consider the new tail folding specific watermark. paulwalker-arm: This doesn't look like the correct place to decide this. Within `LoopVectorizePass…
FoldTailByMasking = true;		FoldTailByMasking = true;
return MaxFactors;		return MaxFactors;
}		}

// If there was a tail-folding hint/switch, but we can't fold the tail by		// If there was a tail-folding hint/switch, but we can't fold the tail by
// masking, fallback to a vectorization with a scalar epilogue.		// masking, fallback to a vectorization with a scalar epilogue.
if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {		if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "		LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "
▲ Show 20 Lines • Show All 5,039 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
auto ExpectedTC = getSmallBestKnownTC(*SE, L);		auto ExpectedTC = getSmallBestKnownTC(*SE, L);
if (ExpectedTC && *ExpectedTC < TinyTripCountVectorThreshold) {		if (ExpectedTC && *ExpectedTC < TinyTripCountVectorThreshold) {
LLVM_DEBUG(dbgs() << "LV: Found a loop with a very small trip count. "		LLVM_DEBUG(dbgs() << "LV: Found a loop with a very small trip count. "
<< "This loop is worth vectorizing only if no scalar "		<< "This loop is worth vectorizing only if no scalar "
<< "iteration overheads are incurred.");		<< "iteration overheads are incurred.");
if (Hints.getForce() == LoopVectorizeHints::FK_Enabled)		if (Hints.getForce() == LoopVectorizeHints::FK_Enabled)
LLVM_DEBUG(dbgs() << " But vectorizing was explicitly forced.\n");		LLVM_DEBUG(dbgs() << " But vectorizing was explicitly forced.\n");
else {		else {
		if (*ExpectedTC > TTI->getMinTripCountTailFoldingThreshold()) {
LLVM_DEBUG(dbgs() << "\n");		LLVM_DEBUG(dbgs() << "\n");
SEL = CM_ScalarEpilogueNotAllowedLowTripLoop;		SEL = CM_ScalarEpilogueNotAllowedLowTripLoop;
		} else {
		LLVM_DEBUG(dbgs() << "But the target considers the trip count too "
		paulwalker-armUnsubmitted Not Done Reply Inline Actions To make the output consistent please add a <space> before the 'But' because otherwise you'll end up printing `overheads are incurred.But the target considers...` paulwalker-arm: To make the output consistent please add a <space> before the 'But' because otherwise you'll…
		"small to consider vectorizing.\n");
		reportVectorizationFailure(
		"The trip count is below the minial threshold value.",
		"loop trip count is too low, avoiding vectorization",
		"LowTripCount", ORE, L);
		Hints.emitRemarkWithHints();
		return false;
		}
}		}
		paulwalker-armUnsubmitted Done Reply Inline Actions Can this be if (ExpectedTC > TTI->getMinTripCountTailFoldingThreshold()) SEL = CM_ScalarEpilogueNotAllowedLowTripLoop and still do what you need? paulwalker-arm:* Can this be ``` if (*ExpectedTC > TTI->getMinTripCountTailFoldingThreshold()) SEL =…
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions I have to return false in order to prevent vectorization. dtemirbulatov: I have to return false in order to prevent vectorization.
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Thanks @dtemirbulatov. In which case it looks like you need to restructure the code a little so we emit the necessary remark and debug information. e.g. if (Hints.getForce() == LoopVectorizeHints::FK_Enabled) ... else if (ExpectedTC > TTI->getMinTripCountTailFoldingThreshold()) { LLVM_DEBUG(dbgs() << "\n"); SEL = CM_ScalarEpilogueNotAllowedLowTripLoop; } else { LLVM_DEBUG(dbgs() << " But the target considers the trip count too small to consider vectorizing.\n"); reportVectorizationFailure(....); Hints.emitRemarkWithHints(); return false; } paulwalker-arm:* Thanks @dtemirbulatov. In which case it looks like you need to restructure the code a little so…
}		}

// Check the function attributes to see if implicit floats are allowed.		// Check the function attributes to see if implicit floats are allowed.
// FIXME: This check doesn't seem possibly correct -- what if the loop is		// FIXME: This check doesn't seem possibly correct -- what if the loop is
// an integer loop and the vector instructions selected are purely integer		// an integer loop and the vector instructions selected are purely integer
// vector instructions?		// vector instructions?
if (F->hasFnAttribute(Attribute::NoImplicitFloat)) {		if (F->hasFnAttribute(Attribute::NoImplicitFloat)) {
reportVectorizationFailure(		reportVectorizationFailure(
▲ Show 20 Lines • Show All 423 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll

Show All 34 Lines	for.body: ; preds = %entry, %for.body
br i1 %exitcond.not, label %for.end, label %for.body		br i1 %exitcond.not, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

define void @trip5_i8(i8* noalias nocapture noundef %dst, i8* noalias nocapture noundef readonly %src) #0 {		define void @trip5_i8(i8* noalias nocapture noundef %dst, i8* noalias nocapture noundef readonly %src) #0 {
; CHECK-LABEL: @trip5_i8(		; CHECK-LABEL: @trip5_i8(
; CHECK: vector.body:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: [[ACTIVE_LANE_MASK:%.]] = phi <vscale x 16 x i1> [ {{%.}}, %vector.ph ], [ [[ACTIVE_LANE_MASK_NEXT:%.*]], %vector.body ]		; CHECK: for.body:
; CHECK: {{%.}} = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0nxv16i8(<vscale x 16 x i8> {{%.*}}, i32 1, <vscale x 16 x i1> [[ACTIVE_LANE_MASK]], <vscale x 16 x i8> poison)		; CHECK-NEXT: [[I_08:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
; CHECK: {{%.}} = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0nxv16i8(<vscale x 16 x i8> {{%.*}}, i32 1, <vscale x 16 x i1> [[ACTIVE_LANE_MASK]], <vscale x 16 x i8> poison)		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[I_08]]
; CHECK: call void @llvm.masked.store.nxv16i8.p0nxv16i8(<vscale x 16 x i8> {{%.}}, <vscale x 16 x i8> {{%.*}}, i32 1, <vscale x 16 x i1> [[ACTIVE_LANE_MASK]])		; CHECK-NEXT: [[TMP0:%.]] = load i8, i8 [[ARRAYIDX]], align 1
; CHECK: [[VSCALE:%.*]] = call i64 @llvm.vscale.i64()		; CHECK-NEXT: [[MUL:%.*]] = shl i8 [[TMP0]], 1
; CHECK-NEXT: [[VF:%.*]] = mul i64 [[VSCALE]], 16		; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i8, i8 [[DST:%.*]], i64 [[I_08]]
; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[VF]]		; CHECK-NEXT: [[TMP1:%.]] = load i8, i8 [[ARRAYIDX1]], align 1
; CHECK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 [[INDEX_NEXT]], i64 5)		; CHECK-NEXT: [[ADD:%.*]] = add i8 [[MUL]], [[TMP1]]
; CHECK-NEXT: [[ACTIVE_LANE_MASK_NOT:%.*]] = xor <vscale x 16 x i1> [[ACTIVE_LANE_MASK_NEXT]], shufflevector (<vscale x 16 x i1> insertelement (<vscale x 16 x i1> poison, i1 true, i32 0), <vscale x 16 x i1> poison, <vscale x 16 x i32> zeroinitializer)		; CHECK-NEXT: store i8 [[ADD]], i8* [[ARRAYIDX1]], align 1
; CHECK-NEXT: br i1 true, label %middle.block, label %vector.body		; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_08]], 1
		; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], 5
		; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
		; CHECK: for.end:
		; CHECK-NEXT: ret void
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%i.08 = phi i64 [ 0, %entry ], [ %inc, %for.body ]		%i.08 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
%arrayidx = getelementptr inbounds i8, i8* %src, i64 %i.08		%arrayidx = getelementptr inbounds i8, i8* %src, i64 %i.08
%0 = load i8, i8* %arrayidx, align 1		%0 = load i8, i8* %arrayidx, align 1
Show All 14 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding for scalable vectorsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 450808

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll

[LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding for scalable vectors
ClosedPublic