This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
3
ScalarEvolution.h
-
lib/
-
Analysis/
3
ScalarEvolution.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorize.cpp

Differential D47899

[SCEV] Implement (non-exact) getSmallConstantTripCountUpperBound
AbandonedPublic

Authored by kparzysz on Jun 7 2018, 12:04 PM.

Download Raw Diff

Details

Reviewers

sanjoy
eli.friedman
dcaballe

Summary

This function will calculate an upper bound for the loop trip count. The returned value does not have to be exact: the loop may iterate fewer times in all circumstances.

Motivation:

This loop came from a customer code:

struct S {
  unsigned X;
  int Y;
};

int fred(int A, int B, struct S *P) {
  int C = A - 1;
  while ((B & 3) != 0) {
    P->X = C * 128;
    P->Y = 0;
    P++;
    B++;
  }
  return 0;
}

Regardless of what B is, the loop will execute at most 4 times, however the upper bound returned from getSmallConstantMaxTripCount is CouldNotCompute. This loop ends up getting interleaved by the loop vectorizer, which causes a needless code growth.

The maximum can actually be computed from the predicated backedge taken count, and patch uses that information to calculate another estimate of the upper bound.

As is, this patch will break a few lit tests. Right now I just want to get initial feedback on whether this is the right approach.

Diff Detail

Repository: rL LLVM

Event Timeline

kparzysz created this revision.Jun 7 2018, 12:04 PM

efriedma added a subscriber: efriedma.Jun 7 2018, 12:46 PM

efriedma added inline comments.

lib/Analysis/ScalarEvolution.cpp
6393	You can't throw away this list of predicates; the computed trip count isn't valid in general, only under the computed predicates, which might not be true.

In case it wasn't clear, getSmallConstantMaxTripCount must be conservatively correct: we use it to unroll loops with an unknown trip count, so we'll miscompile if the trip count at runtime is larger. We could add a separate method for an "estimated" maximum trip count, using a heuristic like this, if it would be useful for the vectorizer.

Ah, I thought it's just an upper bound. Ok, I'll add another function that finds the estimate.

Added a new function to ScalarEvolution to estimate the trip count upper bound.

kparzysz added a reviewer: eli.friedman.Jun 7 2018, 1:32 PM

kparzysz added a reviewer: dcaballe.

The predicated trip count computation isn't modeling the loop in your example well; it thinks the maximum trip count is 2. But I guess that's okay.

include/llvm/Analysis/ScalarEvolution.h
731	The comment is backwards: the important thing is that actual trip count may be larger than the upper bound.
732	I think I'd prefer something like "getSmallConstantTripCountEstimate()"? Not sure.
lib/Analysis/ScalarEvolution.cpp
6397	This is redundant: if the trip count is known, the max trip count is also known.
6404	`getPredicatedBackedgeTakenInfo(L).getMax(L, this, PS)` would compute what you want more directly.

In D47899#1125720, @efriedma wrote:

The predicated trip count computation isn't modeling the loop in your example well; it thinks the maximum trip count is 2. But I guess that's okay.

It was 4 yesterday. I'll check it tomorrow when I'm back at work.

include/llvm/Analysis/ScalarEvolution.h
731	The actual trip count must be less than or equal than the upper bound. It wouldn't make any sense otherwise.

In D47899#1125752, @kparzysz wrote:

In D47899#1125720, @efriedma wrote:

The predicated trip count computation isn't modeling the loop in your example well; it thinks the maximum trip count is 2. But I guess that's okay.

It was 4 yesterday. I'll check it tomorrow when I'm back at work.

Confirmed: it computes it as 4.

This needs to be reworked.

kparzysz mentioned this in D47951: [SCEV] Look through zero-extends in howFarToZero.Jun 8 2018, 9:56 AM

Revision Contents

Path

Size

include/

llvm/

Analysis/

ScalarEvolution.h

5 lines

lib/

Analysis/

ScalarEvolution.cpp

21 lines

Transforms/

Vectorize/

LoopVectorize.cpp

4 lines

Diff 150400

include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 720 Lines • ▼ Show 20 Lines	public:
/// normal unsigned value, if possible. This means that the actual trip		/// normal unsigned value, if possible. This means that the actual trip
/// count is always a multiple of the returned value (don't forget the trip		/// count is always a multiple of the returned value (don't forget the trip
/// count could very well be zero as well!). As explained in the comments		/// count could very well be zero as well!). As explained in the comments
/// for getSmallConstantTripCount, this assumes that control exits the loop		/// for getSmallConstantTripCount, this assumes that control exits the loop
/// via ExitingBlock.		/// via ExitingBlock.
unsigned getSmallConstantTripMultiple(const Loop *L,		unsigned getSmallConstantTripMultiple(const Loop *L,
BasicBlock *ExitingBlock);		BasicBlock *ExitingBlock);

		/// Returns the upper bound of the loop trip count as a normal unsigned
		/// value. The upper bound may be larger than the actual trip count.
		/// Returns 0 if the trip count is unknown or cannot be estimated.
		efriedmaUnsubmitted Not Done Reply Inline Actions The comment is backwards: the important thing is that actual trip count may be larger than the upper bound. efriedma: The comment is backwards: the important thing is that actual trip count may be larger than the…
		kparzyszAuthorUnsubmitted Not Done Reply Inline Actions The actual trip count must be less than or equal than the upper bound. It wouldn't make any sense otherwise. kparzysz: The actual trip count must be less than or equal than the upper bound. It wouldn't make any…
		unsigned getSmallConstantTripCountUpperBound(const Loop *L);
		efriedmaUnsubmitted Not Done Reply Inline Actions I think I'd prefer something like "getSmallConstantTripCountEstimate()"? Not sure. efriedma: I think I'd prefer something like "getSmallConstantTripCountEstimate()"? Not sure.

/// Get the expression for the number of loop iterations for which this loop		/// Get the expression for the number of loop iterations for which this loop
/// is guaranteed not to exit via ExitingBlock. Otherwise return		/// is guaranteed not to exit via ExitingBlock. Otherwise return
/// SCEVCouldNotCompute.		/// SCEVCouldNotCompute.
const SCEV getExitCount(const Loop L, BasicBlock *ExitingBlock);		const SCEV getExitCount(const Loop L, BasicBlock *ExitingBlock);

/// If the specified loop has a predictable backedge-taken count, return it,		/// If the specified loop has a predictable backedge-taken count, return it,
/// otherwise return a SCEVCouldNotCompute object. The backedge-taken count is		/// otherwise return a SCEVCouldNotCompute object. The backedge-taken count is
/// the number of times the loop header will be branched to from within the		/// the number of times the loop header will be branched to from within the
▲ Show 20 Lines • Show All 1,271 Lines • Show Last 20 Lines

lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,384 Lines • ▼ Show 20 Lines
	}			}

	unsigned ScalarEvolution::getSmallConstantMaxTripCount(const Loop *L) {			unsigned ScalarEvolution::getSmallConstantMaxTripCount(const Loop *L) {
	const auto *MaxExitCount =			const auto *MaxExitCount =
	dyn_cast<SCEVConstant>(getMaxBackedgeTakenCount(L));			dyn_cast<SCEVConstant>(getMaxBackedgeTakenCount(L));
	return getConstantTripCount(MaxExitCount);			return getConstantTripCount(MaxExitCount);
	}			}

				unsigned ScalarEvolution::getSmallConstantTripCountUpperBound(const Loop *L) {
				efriedmaUnsubmitted Not Done Reply Inline Actions You can't throw away this list of predicates; the computed trip count isn't valid in general, only under the computed predicates, which might not be true. efriedma: You can't throw away this list of predicates; the computed trip count isn't valid in general…
				if (const SCEVConstant *TakenCount =
				dyn_cast<SCEVConstant>(getBackedgeTakenCount(L))) {
				if (unsigned C = getConstantTripCount(TakenCount))
				return C;
				efriedmaUnsubmitted Not Done Reply Inline Actions This is redundant: if the trip count is known, the max trip count is also known. efriedma: This is redundant: if the trip count is known, the max trip count is also known.
				}

				if (unsigned C = getSmallConstantMaxTripCount(L))
				return C;

				SCEVUnionPredicate Ps;
				const SCEV *PredBEC = getPredicatedBackedgeTakenCount(L, Ps);
				efriedmaUnsubmitted Not Done Reply Inline Actions `getPredicatedBackedgeTakenInfo(L).getMax(L, this, PS)` would compute what you want more directly. efriedma: `getPredicatedBackedgeTakenInfo(L).getMax(L, this, PS)` would compute what you want more…
				if (PredBEC != getCouldNotCompute()) {
				APInt S = getUnsignedRange(PredBEC).getSetSize();
				if (S.getActiveBits() < 32)
				return S.getZExtValue();
				}

				return 0;
				}

	unsigned ScalarEvolution::getSmallConstantTripMultiple(const Loop *L) {			unsigned ScalarEvolution::getSmallConstantTripMultiple(const Loop *L) {
	if (BasicBlock *ExitingBB = L->getExitingBlock())			if (BasicBlock *ExitingBB = L->getExitingBlock())
	return getSmallConstantTripMultiple(L, ExitingBB);			return getSmallConstantTripMultiple(L, ExitingBB);

	// No trip multiple information for multiple exits.			// No trip multiple information for multiple exits.
	return 0;			return 0;
	}			}

	▲ Show 20 Lines • Show All 5,703 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,142 Lines • ▼ Show 20 Lines	unsigned LoopVectorizationCostModel::selectInterleaveCount(bool OptForSize,
if (OptForSize)		if (OptForSize)
return 1;		return 1;

// We used the distance for the interleave count.		// We used the distance for the interleave count.
if (Legal->getMaxSafeDepDistBytes() != -1U)		if (Legal->getMaxSafeDepDistBytes() != -1U)
return 1;		return 1;

// Do not interleave loops with a relatively small trip count.		// Do not interleave loops with a relatively small trip count.
unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);		unsigned TC = PSE.getSE()->getSmallConstantTripCountUpperBound(TheLoop);
if (TC > 1 && TC < TinyTripCountInterleaveThreshold)		if (TC > 1 && TC < TinyTripCountInterleaveThreshold)
return 1;		return 1;

unsigned TargetNumRegisters = TTI.getNumberOfRegisters(VF > 1);		unsigned TargetNumRegisters = TTI.getNumberOfRegisters(VF > 1);
LLVM_DEBUG(dbgs() << "LV: The target has " << TargetNumRegisters		LLVM_DEBUG(dbgs() << "LV: The target has " << TargetNumRegisters
<< " registers\n");		<< " registers\n");

if (VF == 1) {		if (VF == 1) {
▲ Show 20 Lines • Show All 2,181 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
if (!HasExpectedTC && LoopVectorizeWithBlockFrequency) {		if (!HasExpectedTC && LoopVectorizeWithBlockFrequency) {
auto EstimatedTC = getLoopEstimatedTripCount(L);		auto EstimatedTC = getLoopEstimatedTripCount(L);
if (EstimatedTC) {		if (EstimatedTC) {
ExpectedTC = *EstimatedTC;		ExpectedTC = *EstimatedTC;
HasExpectedTC = true;		HasExpectedTC = true;
}		}
}		}
if (!HasExpectedTC) {		if (!HasExpectedTC) {
ExpectedTC = SE->getSmallConstantMaxTripCount(L);		ExpectedTC = SE->getSmallConstantTripCountUpperBound(L);
HasExpectedTC = (ExpectedTC > 0);		HasExpectedTC = (ExpectedTC > 0);
}		}

if (HasExpectedTC && ExpectedTC < TinyTripCountVectorThreshold) {		if (HasExpectedTC && ExpectedTC < TinyTripCountVectorThreshold) {
LLVM_DEBUG(dbgs() << "LV: Found a loop with a very small trip count. "		LLVM_DEBUG(dbgs() << "LV: Found a loop with a very small trip count. "
<< "This loop is worth vectorizing only if no scalar "		<< "This loop is worth vectorizing only if no scalar "
<< "iteration overheads are incurred.");		<< "iteration overheads are incurred.");
if (Hints.getForce() == LoopVectorizeHints::FK_Enabled)		if (Hints.getForce() == LoopVectorizeHints::FK_Enabled)
▲ Show 20 Lines • Show All 292 Lines • Show Last 20 Lines