This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
1
LoopCacheAnalysis.h
-
lib/Analysis/
-
Analysis/
5
LoopCacheAnalysis.cpp
-
test/Analysis/LoopCacheAnalysis/PowerPC/
-
Analysis/
-
LoopCacheAnalysis/
-
PowerPC/
-
matvecmul.ll

Differential D75920

[LoopCacheAnalysis] Improve cost heuristic.
Needs ReviewPublic

Authored by etiotto on Mar 10 2020, 6:45 AM.

Download Raw Diff

Details

Reviewers

Whitney
bmahjour
Meinersbur
kbarton
jdoerfert

Summary

Improve the cache cost estimation for indexed references that are not consecutive. For example, given the indexed reference 'A[i][j][k]', and assuming the j-loop is in the innermost position in the loop nest containing the array reference, the cost would be equal to the RefCost cost times the iterations of the j-loop (rather than just the RefCost).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

etiotto created this revision.Mar 10 2020, 6:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2020, 6:45 AM

Herald added subscribers: llvm-commits, hiraditya, nemanjai. · View Herald Transcript

Harbormaster failed remote builds in B48684: Diff 249359!Mar 10 2020, 7:32 AM

Herald added a subscriber: • wuzish. · View Herald TranscriptMar 10 2020, 7:32 AM

etiotto updated this revision to Diff 249946.Mar 12 2020, 8:19 AM

Harbormaster completed remote builds in B48997: Diff 249946.Mar 12 2020, 9:13 AM

bmahjour added a reviewer: jdoerfert.Mar 13 2020, 9:08 AM

bmahjour added inline comments.

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
107	would be useful to mention that the index is zero-based. ie: "Retrive the zero-based \p Index of the subscript ..."
llvm/lib/Analysis/LoopCacheAnalysis.cpp
310	but in the code below we multiply the trip count for all indexes starting from the given loop up to the loop for the second last subscript.
320	why `getNumSubscripts() - 1` instead of just `getNumSubscripts()` ?

etiotto retitled this revision from [LoopCacheAnalysis}: Improve cost heuristic. to [LoopCacheAnalysis] Improve cost heuristic..Mar 13 2020, 11:28 AM

Meinersbur added inline comments.Mar 13 2020, 2:36 PM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
106–119	[sugggestion] if (auto *TripCount = dyn_cast<SCEVConstant>(BackedgeTakenCount)) return SE.getAddExpr(BackedgeTakenCount, SE.getOne(BackedgeTakenCount->getType()) LLVM_DEBUG(dbgs() << "Trip count of loop " << L.getName() << " could not be computed, using DefaultTripCount\n"); return SE.getConstant(ElemSize.getType(), DefaultTripCount);
107	`isa<SCEVCouldNotCompute>` is redundant: the object cannot be `SCEVCouldNotCompute` and `SCEVConstant` at the same time
451	What if: Multiple subscripts are using the loop variables? A subscript uses multiple loop variables (eg. `i+j`)? `SCEVAddRecExpr` is processed further, e.g. negated?

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

LoopCacheAnalysis.h

7 lines

lib/

Analysis/

LoopCacheAnalysis.cpp

83 lines

test/

Analysis/

LoopCacheAnalysis/

PowerPC/

matvecmul.ll

6 lines

Diff 249359

llvm/include/llvm/Analysis/LoopCacheAnalysis.h

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	private:
bool isLoopInvariant(const Loop &L) const;		bool isLoopInvariant(const Loop &L) const;

/// Return true if the indexed reference is 'consecutive' in loop \p L.		/// Return true if the indexed reference is 'consecutive' in loop \p L.
/// An indexed reference is 'consecutive' if the only coefficient that uses		/// An indexed reference is 'consecutive' if the only coefficient that uses
/// the loop induction variable is the rightmost one, and the access stride is		/// the loop induction variable is the rightmost one, and the access stride is
/// smaller than the cache line size \p CLS.		/// smaller than the cache line size \p CLS.
bool isConsecutive(const Loop &L, unsigned CLS) const;		bool isConsecutive(const Loop &L, unsigned CLS) const;

		/// Retrieve the \p Index of the subscript corresponding to the given loop \p
		bmahjourUnsubmitted Not Done Reply Inline Actions would be useful to mention that the index is zero-based. ie: "Retrive the zero-based \p Index of the subscript ..." bmahjour: would be useful to mention that the index is zero-based. ie: "Retrive the zero-based \p Index…
		/// L. Return true if the subscript index is succesfully located and false
		/// otherwise. For example given the indexed reference 'A[i][2j+1][3k+2][l]',
		/// the call 'getSubscriptIndex(loop-k,Index)' would fill \p Index with the
		/// value 2 and return true.
		bool getSubscriptIndex(const Loop &L, unsigned &Index) const;

/// Return the coefficient used in the rightmost dimension.		/// Return the coefficient used in the rightmost dimension.
const SCEV *getLastCoefficient() const;		const SCEV *getLastCoefficient() const;

/// Return true if the coefficient corresponding to induction variable of		/// Return true if the coefficient corresponding to induction variable of
/// loop \p L in the given \p Subscript is zero or is loop invariant in \p L.		/// loop \p L in the given \p Subscript is zero or is loop invariant in \p L.
bool isCoeffForLoopZeroOrInvariant(const SCEV &Subscript,		bool isCoeffForLoopZeroOrInvariant(const SCEV &Subscript,
const Loop &L) const;		const Loop &L) const;

▲ Show 20 Lines • Show All 167 Lines • Show Last 20 Lines

llvm/lib/Analysis/LoopCacheAnalysis.cpp

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	static bool isOneDimensionalArray(const SCEV &AccessFn, const SCEV &ElemSize,

const SCEV *StepRec = AR->getStepRecurrence(SE);		const SCEV *StepRec = AR->getStepRecurrence(SE);
if (StepRec && SE.isKnownNegative(StepRec))		if (StepRec && SE.isKnownNegative(StepRec))
StepRec = SE.getNegativeSCEV(StepRec);		StepRec = SE.getNegativeSCEV(StepRec);

return StepRec == &ElemSize;		return StepRec == &ElemSize;
}		}

/// Compute the trip count for the given loop \p L. Return the SCEV expression		/// Compute the trip count for the given loop \p L or assume a default value if
/// for the trip count or nullptr if it cannot be computed.		/// it is not a compile time constant. Return the SCEV expression for the trip
static const SCEV *computeTripCount(const Loop &L, ScalarEvolution &SE) {		/// count.
		static const SCEV *computeTripCount(const Loop &L, const SCEV &ElemSize,
		ScalarEvolution &SE) {
const SCEV *BackedgeTakenCount = SE.getBackedgeTakenCount(&L);		const SCEV *BackedgeTakenCount = SE.getBackedgeTakenCount(&L);
if (isa<SCEVCouldNotCompute>(BackedgeTakenCount) \|\|		const SCEV *TripCount =
!isa<SCEVConstant>(BackedgeTakenCount))		(!isa<SCEVCouldNotCompute>(BackedgeTakenCount) &&
		MeinersburUnsubmitted Not Done Reply Inline Actions `isa<SCEVCouldNotCompute>` is redundant: the object cannot be `SCEVCouldNotCompute` and `SCEVConstant` at the same time Meinersbur: `isa<SCEVCouldNotCompute>` is redundant: the object cannot be `SCEVCouldNotCompute` and…
return nullptr;		isa<SCEVConstant>(BackedgeTakenCount))
		? SE.getAddExpr(BackedgeTakenCount,
		SE.getOne(BackedgeTakenCount->getType()))
		: nullptr;

		if (!TripCount) {
		LLVM_DEBUG(dbgs() << "Trip count of loop " << L.getName()
		<< " could not be computed, using DefaultTripCount\n");
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - << " could not be computed, using DefaultTripCount\n"); + << " could not be computed, using DefaultTripCount\n"); Lint: Pre-merge checks: clang-format: please reformat the code ``` - << " could not be computed, using…
		TripCount = SE.getConstant(ElemSize.getType(), DefaultTripCount);
		}

return SE.getAddExpr(BackedgeTakenCount,		return TripCount;
		MeinersburUnsubmitted Not Done Reply Inline Actions [sugggestion] if (auto TripCount = dyn_cast<SCEVConstant>(BackedgeTakenCount)) return SE.getAddExpr(BackedgeTakenCount, SE.getOne(BackedgeTakenCount->getType()) LLVM_DEBUG(dbgs() << "Trip count of loop " << L.getName() << " could not be computed, using DefaultTripCount\n"); return SE.getConstant(ElemSize.getType(), DefaultTripCount); Meinersbur:* [sugggestion] ``` if (auto *TripCount = dyn_cast<SCEVConstant>(BackedgeTakenCount)) return…
SE.getOne(BackedgeTakenCount->getType()));
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// IndexedReference implementation		// IndexedReference implementation
//		//
raw_ostream &llvm::operator<<(raw_ostream &OS, const IndexedReference &R) {		raw_ostream &llvm::operator<<(raw_ostream &OS, const IndexedReference &R) {
if (!R.IsValid) {		if (!R.IsValid) {
OS << R.StoreOrLoadInst;		OS << R.StoreOrLoadInst;
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	CacheCostTy IndexedReference::computeRefCost(const Loop &L,
});		});

// If the indexed reference is loop invariant the cost is one.		// If the indexed reference is loop invariant the cost is one.
if (isLoopInvariant(L)) {		if (isLoopInvariant(L)) {
LLVM_DEBUG(dbgs().indent(4) << "Reference is loop invariant: RefCost=1\n");		LLVM_DEBUG(dbgs().indent(4) << "Reference is loop invariant: RefCost=1\n");
return 1;		return 1;
}		}

const SCEV *TripCount = computeTripCount(L, SE);		const SCEV TripCount = computeTripCount(L, Sizes.back(), SE);
if (!TripCount) {		assert(TripCount && "Expecting valid TripCount");
LLVM_DEBUG(dbgs() << "Trip count of loop " << L.getName()
<< " could not be computed, using DefaultTripCount\n");
const SCEV *ElemSize = Sizes.back();
TripCount = SE.getConstant(ElemSize->getType(), DefaultTripCount);
}
LLVM_DEBUG(dbgs() << "TripCount=" << *TripCount << "\n");		LLVM_DEBUG(dbgs() << "TripCount=" << *TripCount << "\n");

// If the indexed reference is 'consecutive' the cost is		const SCEV *RefCost = nullptr;
// (TripCount*Stride)/CLS, otherwise the cost is TripCount.
const SCEV *RefCost = TripCount;

if (isConsecutive(L, CLS)) {		if (isConsecutive(L, CLS)) {
		// If the indexed reference is 'consecutive' the cost is
		// (TripCount*Stride)/CLS.
const SCEV *Coeff = getLastCoefficient();		const SCEV *Coeff = getLastCoefficient();
const SCEV *ElemSize = Sizes.back();		const SCEV *ElemSize = Sizes.back();
		assert(Coeff->getType() == ElemSize->getType() &&
		"Expecting the same type");
const SCEV *Stride = SE.getMulExpr(Coeff, ElemSize);		const SCEV *Stride = SE.getMulExpr(Coeff, ElemSize);
const SCEV *CacheLineSize = SE.getConstant(Stride->getType(), CLS);		const SCEV *CacheLineSize = SE.getConstant(Stride->getType(), CLS);
Type *WiderType = SE.getWiderType(Stride->getType(), TripCount->getType());		Type *WiderType = SE.getWiderType(Stride->getType(), TripCount->getType());
if (SE.isKnownNegative(Stride))		if (SE.isKnownNegative(Stride))
Stride = SE.getNegativeSCEV(Stride);		Stride = SE.getNegativeSCEV(Stride);
Stride = SE.getNoopOrAnyExtend(Stride, WiderType);		Stride = SE.getNoopOrAnyExtend(Stride, WiderType);
TripCount = SE.getNoopOrAnyExtend(TripCount, WiderType);		TripCount = SE.getNoopOrAnyExtend(TripCount, WiderType);
const SCEV *Numerator = SE.getMulExpr(Stride, TripCount);		const SCEV *Numerator = SE.getMulExpr(Stride, TripCount);
RefCost = SE.getUDivExpr(Numerator, CacheLineSize);		RefCost = SE.getUDivExpr(Numerator, CacheLineSize);

LLVM_DEBUG(dbgs().indent(4)		LLVM_DEBUG(dbgs().indent(4)
<< "Access is consecutive: RefCost=(TripCount*Stride)/CLS="		<< "Access is consecutive: RefCost=(TripCount*Stride)/CLS="
<< *RefCost << "\n");		<< *RefCost << "\n");
} else		} else {
		// If the indexed reference is not 'consecutive' the cost is equal to the
		// number of iterations associated with the subscript corresponding to the
		bmahjourUnsubmitted Not Done Reply Inline Actions but in the code below we multiply the trip count for all indexes starting from the given loop up to the loop for the second last subscript. bmahjour: but in the code below we multiply the trip count for all indexes starting from the given loop…
		// loop. For example, given the indexed reference 'A[i][j][k]', and assuming
		// the j-loop is in the innermost position, the cost would be equal to the
		// iterations of the j-loop.
		RefCost = TripCount;

		unsigned Index = 0;
		bool Found = getSubscriptIndex(L, Index);
		assert(Found && "Cound not locate a valid Index");

		for (unsigned I = Index + 1; I < getNumSubscripts() - 1; ++I) {
		bmahjourUnsubmitted Not Done Reply Inline Actions why `getNumSubscripts() - 1` instead of just `getNumSubscripts()` ? bmahjour: why `getNumSubscripts() - 1` instead of just `getNumSubscripts()` ?
		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(getSubscript(I));
		assert(AR && AR->getLoop() && "Expecting valid loop");
		const SCEV *TripCount =
		computeTripCount(AR->getLoop(), Sizes.back(), SE);
		Type *WiderType = SE.getWiderType(RefCost->getType(), TripCount->getType());
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Type WiderType = SE.getWiderType(RefCost->getType(), TripCount->getType()); + Type WiderType = + SE.getWiderType(RefCost->getType(), TripCount->getType()); Lint: Pre-merge checks: clang-format: please reformat the code ``` - Type *WiderType = SE.getWiderType(RefCost…
		RefCost = SE.getMulExpr(SE.getNoopOrAnyExtend(RefCost, WiderType),
		SE.getNoopOrAnyExtend(TripCount, WiderType));
		}

LLVM_DEBUG(dbgs().indent(4)		LLVM_DEBUG(dbgs().indent(4)
<< "Access is not consecutive: RefCost=TripCount=" << *RefCost		<< "Access is not consecutive: RefCost=" << *RefCost << "\n");
<< "\n");		}
		assert(RefCost && "Expecting a valid RefCost");

// Attempt to fold RefCost into a constant.		// Attempt to fold RefCost into a constant.
if (auto ConstantCost = dyn_cast<SCEVConstant>(RefCost))		if (auto ConstantCost = dyn_cast<SCEVConstant>(RefCost))
return ConstantCost->getValue()->getSExtValue();		return ConstantCost->getValue()->getSExtValue();

LLVM_DEBUG(dbgs().indent(4)		LLVM_DEBUG(dbgs().indent(4)
<< "RefCost is not a constant! Setting to RefCost=InvalidCost "		<< "RefCost is not a constant! Setting to RefCost=InvalidCost "
"(invalid value).\n");		"(invalid value).\n");
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	bool IndexedReference::isConsecutive(const Loop &L, unsigned CLS) const {
const SCEV *ElemSize = Sizes.back();		const SCEV *ElemSize = Sizes.back();
const SCEV *Stride = SE.getMulExpr(Coeff, ElemSize);		const SCEV *Stride = SE.getMulExpr(Coeff, ElemSize);
const SCEV *CacheLineSize = SE.getConstant(Stride->getType(), CLS);		const SCEV *CacheLineSize = SE.getConstant(Stride->getType(), CLS);

Stride = SE.isKnownNegative(Stride) ? SE.getNegativeSCEV(Stride) : Stride;		Stride = SE.isKnownNegative(Stride) ? SE.getNegativeSCEV(Stride) : Stride;
return SE.isKnownPredicate(ICmpInst::ICMP_ULT, Stride, CacheLineSize);		return SE.isKnownPredicate(ICmpInst::ICMP_ULT, Stride, CacheLineSize);
}		}

		bool IndexedReference::getSubscriptIndex(const Loop &L, unsigned &Index) const {
		MeinersburUnsubmitted Not Done Reply Inline Actions What if: Multiple subscripts are using the loop variables? A subscript uses multiple loop variables (eg. `i+j`)? `SCEVAddRecExpr` is processed further, e.g. negated? Meinersbur: What if: * Multiple subscripts are using the loop variables? * A subscript uses multiple loop…
		for (auto Idx : seq<unsigned>(0, getNumSubscripts())) {
		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(getSubscript(Idx));
		if (AR && AR->getLoop() == &L) {
		Index = Idx;
		return true;
		}
		}
		return false;
		}

const SCEV *IndexedReference::getLastCoefficient() const {		const SCEV *IndexedReference::getLastCoefficient() const {
const SCEV *LastSubscript = getLastSubscript();		const SCEV *LastSubscript = getLastSubscript();
assert(isa<SCEVAddRecExpr>(LastSubscript) &&		assert(isa<SCEVAddRecExpr>(LastSubscript) &&
"Expecting a SCEV add recurrence expression");		"Expecting a SCEV add recurrence expression");
const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(LastSubscript);		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(LastSubscript);
return AR->getStepRecurrence(SE);		return AR->getStepRecurrence(SE);
}		}

▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

llvm/test/Analysis/LoopCacheAnalysis/PowerPC/matvecmul.ll

	; RUN: opt < %s -passes='print<loop-cache-cost>' -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -passes='print<loop-cache-cost>' -disable-output 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i64:64-n32:64"			target datalayout = "e-m:e-i64:64-n32:64"
	target triple = "powerpc64le-unknown-linux-gnu"			target triple = "powerpc64le-unknown-linux-gnu"

	; void matvecmul(const double __restrict y, const double __restrict x, const double * __restrict b,			; void matvecmul(const double __restrict y, const double __restrict x, const double * __restrict b,
	; const int * __restrict nb, const int * __restrict nx, const int * __restrict ny, const int * __restrict nz) {			; const int * __restrict nb, const int * __restrict nx, const int * __restrict ny, const int * __restrict nz) {
	;			;
	; for (int k=1;k<nz,++k)			; for (int k=1;k<nz,++k)
	; for (int j=1;j<ny,++j)			; for (int j=1;j<ny,++j)
	; for (int i=1;i<nx,++i)			; for (int i=1;i<nx,++i)
	; for (int l=1;l<nb,++l)			; for (int l=1;l<nb,++l)
	; for (int m=1;m<nb,++m)			; for (int m=1;m<nb,++m)
	; y[k+1][j][i][l] = y[k+1][j][i][l] + b[k][j][i][m][l]*x[k][j][i][m]			; y[k+1][j][i][l] = y[k+1][j][i][l] + b[k][j][i][m][l]*x[k][j][i][m]
	; }			; }

	; CHECK-DAG: Loop 'k_loop' has cost = 30000000000			; CHECK-DAG: Loop 'k_loop' has cost = 10200000000000000
	; CHECK-DAG: Loop 'j_loop' has cost = 30000000000			; CHECK-DAG: Loop 'j_loop' has cost = 102000000000000
	; CHECK-DAG: Loop 'i_loop' has cost = 30000000000			; CHECK-DAG: Loop 'i_loop' has cost = 1020000000000
	; CHECK-DAG: Loop 'm_loop' has cost = 10700000000			; CHECK-DAG: Loop 'm_loop' has cost = 10700000000
	; CHECK-DAG: Loop 'l_loop' has cost = 1300000000			; CHECK-DAG: Loop 'l_loop' has cost = 1300000000

	%_elem_type_of_double = type <{ double }>			%_elem_type_of_double = type <{ double }>

	; Function Attrs: norecurse nounwind			; Function Attrs: norecurse nounwind
	define void @mat_vec_mpy([0 x %_elem_type_of_double]* noalias %y, [0 x %_elem_type_of_double]* noalias readonly %x,			define void @mat_vec_mpy([0 x %_elem_type_of_double]* noalias %y, [0 x %_elem_type_of_double]* noalias readonly %x,
	[0 x %_elem_type_of_double]* noalias readonly %b, i32* noalias readonly %nb, i32* noalias readonly %nx,			[0 x %_elem_type_of_double]* noalias readonly %b, i32* noalias readonly %nb, i32* noalias readonly %nx,
	▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopCacheAnalysis] Improve cost heuristic.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 249359

llvm/include/llvm/Analysis/LoopCacheAnalysis.h

llvm/lib/Analysis/LoopCacheAnalysis.cpp

llvm/test/Analysis/LoopCacheAnalysis/PowerPC/matvecmul.ll

[LoopCacheAnalysis] Improve cost heuristic.
Needs ReviewPublic