This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
LoopCacheAnalysis.h
-
lib/Analysis/
-
Analysis/
-
LoopCacheAnalysis.cpp
-
test/Analysis/LoopCacheAnalysis/PowerPC/
-
Analysis/
-
LoopCacheAnalysis/
-
PowerPC/
2
single-store.ll

Differential D122776

[NFC][LoopCacheAnalysis] Add a motivating test case for improved loop cache analysis cost calculation
ClosedPublic

Authored by congzhe on Mar 30 2022, 6:39 PM.

Download Raw Diff

Details

Reviewers

bmahjour
Whitney

Group Reviewers

Restricted Project

Commits

rG5e004fb78769: [LoopCacheAnalysis][NFC] Add a test case for improved loop cache analysis cost…

Summary

Analysis of loop cache cost (LCC) is supposed to return an ordered vector of loops that reflects their optimal order in the loopnest based on cache locality (think of loop interchange, for example). The 1st loop in the vector should be placed as the outermost loop and the last loop should be placed as the innermost loop.

Motivation of this patch:
According to what we discussed in the last loopOptWG meeting, I've been improving LCC in various aspects. Currently LCC only considers estimating the number of cache lines that each loop accesses, and sorts the loops based on those numbers. This could result in inaccuracy, for exampe:

void foo(long n, long m, long o, int A[n][m][o]) {
  for (long j = 0; j < m; j++)
    for (long i = 0; i < n; i++)
      for (long k = 0; k < o; k++)
        A[2*i+3][2*j-4][2*k+7] = 1;
}

Consider cache locality, for this loopnest we would like to place the i-loop as the outermost loop, the j-loop as the middle loop, and the k-loop as the innermost loop. In other words, the final resulting vector should be [i-loop, j-loop, k-loop]. However, what the current LCC returns is [j-loop, i-loop, k-loop], which is incorrect if we used it as the cost model for loop interchange.

The reason for the incorrect result is that LCC only considers the "cost" of each loop and would output the following (vector of sorted loops), and hence could not determine whether the i-loop and the j-loop should be placed as the outermost loop, since both have the same cost.

Loop 'for.j' has cost = 1000000
Loop 'for.i' has cost = 1000000
Loop 'for.k' has cost = 60000

Furthermore, consider the motivating test case in this patch https://reviews.llvm.org/D120386. I will soon post another LCC patch that enables delinearization for fixed-size arrays, by then LCC could estimate the cost for loops in that motivating case but would output an incorrect result, meaning LCC is unable to place each loop at an ideal position in the loopnest due to exactly the same reason I described.

What this patch proposes is an improvement for LCC: we not only consider the "cost" (estimated number of cache lines we access), but also consider the "stride" and sort the loops based on both "cost" and "stride". For the above loopnest, with this patch LCC would give the following result

Loop 'for.i' has cost = 1000000 and stride = 80000
Loop 'for.j' has cost = 1000000 and stride = 800
Loop 'for.k' has cost = 60000 and stride = 8,

and each loop is now placed in the ideal position.

Note that in this example since the array is a parametric-sized array, currently LCC uses DefaultTripCount as an estimate of trip counts which is the reason for those "cost" numbers. Therefore we did the same for estimating "stride" numbers in this patch -- the actual strides for loops i, j, k are 8*m*o, 8*o, 8, respectively, and their corresponding estimations are 80000, 800 and 8, according to DefaultTripCount.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

congzhe created this revision.Mar 30 2022, 6:39 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2022, 6:39 PM

Herald added subscribers: hiraditya, nemanjai. · View Herald Transcript

congzhe requested review of this revision.Mar 30 2022, 6:39 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2022, 6:39 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B157077: Diff 419304.Mar 30 2022, 6:43 PM

congzhe retitled this revision from [LoopCacheAnalysis] Improve loop cache analysis results by taking strides into consideration to [LoopCacheAnalysis] Improve loop cache analysis results by taking memory access strides into consideration.Mar 30 2022, 6:56 PM

congzhe edited the summary of this revision. (Show Details)

congzhe edited the summary of this revision. (Show Details)Mar 30 2022, 7:05 PM

congzhe added reviewers: bmahjour, Whitney.

congzhe edited the summary of this revision. (Show Details)Mar 30 2022, 7:13 PM

congzhe edited the summary of this revision. (Show Details)Mar 30 2022, 7:31 PM

congzhe edited the summary of this revision. (Show Details)Mar 31 2022, 1:42 PM

congzhe added a reviewer: Restricted Project.

congzhe added a project: Restricted Project.Mar 31 2022, 1:57 PM

bmahjour mentioned this in D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost.Apr 8 2022, 9:18 AM

Thanks for looking into this @congzhe . As you described we don't seem to be able to distinguish between the cost of moving outer loops in nests more than 2 levels deep. The data locality paper tries to estimate the cost based on the estimated number of cache lines used when moving a loop into innermost level. Since the stride information (when it's less than the cache line size) is already used in RefCost functions, it makes more sense to somehow integrate it into the cost formula when it's larger than the cache line size. Treating stride as a separate component of the cost and associating it with the loop (as opposed to each reference group) makes the result of the analysis more complicated to consume and can cause ambiguity when dealing with multiple reference groups.

I think the proper way to solve this is to fold the "stride" information into the cost calculation. In computing the LoopCost(l) function, the paper already uses the notion of estimating the cache lines used based on the product of loop trip counts. I'd like to propose https://reviews.llvm.org/D123400 as a more desirable alternative as it tries to use the same concept to take the depth of the subscript dimensions into account when calculating each RefCost. Comments are welcome. FYI @etiotto.

@bmahjour Hi Bardia, I've updated this patch to a pure NFC patch with the motivating test case only. Looking forward to your comment :)

bmahjour added inline comments.May 3 2022, 7:20 AM

llvm/test/Analysis/LoopCacheAnalysis/PowerPC/single-store.ll
78	Could you please make it more clear how this test is different from the one above (ie add a comment to say that this is testing to make sure the analysis prints the loops in the expected order despite the original loop nest being in suboptimal order)?
88–90	Use CHECK and CHECK-NEXT

Harbormaster completed remote builds in B162449: Diff 426689.May 3 2022, 7:46 AM

Thanks for the comments, I've updated the patch accordingly.

LGTM

This revision is now accepted and ready to land.May 3 2022, 9:08 AM

Harbormaster completed remote builds in B162473: Diff 426727.May 3 2022, 9:49 AM

This revision was landed with ongoing or failed builds.May 4 2022, 2:14 PM

Closed by commit rG5e004fb78769: [LoopCacheAnalysis][NFC] Add a test case for improved loop cache analysis cost… (authored by congzhe). · Explain Why

This revision was automatically updated to reflect the committed changes.

congzhe added a commit: rG5e004fb78769: [LoopCacheAnalysis][NFC] Add a test case for improved loop cache analysis cost….

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

LoopCacheAnalysis.h

55 lines

lib/

Analysis/

LoopCacheAnalysis.cpp

99 lines

test/

Analysis/

LoopCacheAnalysis/

PowerPC/

single-store.ll

71 lines

Diff 419304

llvm/include/llvm/Analysis/LoopCacheAnalysis.h

Show All 9 Lines
/// This file defines the interface for the loop cache analysis.		/// This file defines the interface for the loop cache analysis.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_ANALYSIS_LOOPCACHEANALYSIS_H		#ifndef LLVM_ANALYSIS_LOOPCACHEANALYSIS_H
#define LLVM_ANALYSIS_LOOPCACHEANALYSIS_H		#define LLVM_ANALYSIS_LOOPCACHEANALYSIS_H

#include "llvm/Analysis/LoopAnalysisManager.h"		#include "llvm/Analysis/LoopAnalysisManager.h"
		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

namespace llvm {		namespace llvm {

class AAResults;		class AAResults;
class DependenceInfo;		class DependenceInfo;
class LPMUpdater;		class LPMUpdater;
class ScalarEvolution;		class ScalarEvolution;
class SCEV;		class SCEV;
class TargetTransformInfo;		class TargetTransformInfo;

using CacheCostTy = int64_t;		using CacheCostTy = std::pair<int64_t, int64_t>;
using LoopVectorTy = SmallVector<Loop *, 8>;		using LoopVectorTy = SmallVector<Loop *, 8>;

/// Represents a memory reference as a base pointer and a set of indexing		/// Represents a memory reference as a base pointer and a set of indexing
/// operations. For example given the array reference A[i][2j+1][3k+2] in a		/// operations. For example given the array reference A[i][2j+1][3k+2] in a
/// 3-dim loop nest:		/// 3-dim loop nest:
/// for(i=0;i<n;++i)		/// for(i=0;i<n;++i)
/// for(j=0;j<m;++j)		/// for(j=0;j<m;++j)
/// for(k=0;k<o;++k)		/// for(k=0;k<o;++k)
/// ... A[i][2j+1][3k+2] ...		/// ... A[i][2j+1][3k+2] ...
/// We expect:		/// We expect:
/// BasePointer -> A		/// BasePointer -> A
/// Subscripts -> [{0,+,1}<%for.i>][{1,+,2}<%for.j>][{2,+,3}<%for.k>]		/// Subscripts -> [{0,+,1}<%for.i>][{1,+,2}<%for.j>][{2,+,3}<%for.k>]
/// Sizes -> [m][o][4]		/// Sizes -> [m][o][4]
class IndexedReference {		class IndexedReference {
friend raw_ostream &operator<<(raw_ostream &OS, const IndexedReference &R);		friend raw_ostream &operator<<(raw_ostream &OS, const IndexedReference &R);

public:		public:
/// Construct an indexed reference given a \p StoreOrLoadInst instruction.		/// Construct an indexed reference given a \p StoreOrLoadInst instruction.
IndexedReference(Instruction &StoreOrLoadInst, const LoopInfo &LI,		IndexedReference(Instruction &StoreOrLoadInst, const LoopInfo &LI,
ScalarEvolution &SE);		ScalarEvolution &SE);

bool isValid() const { return IsValid; }		bool isValid() const { return IsValid; }
const SCEV *getBasePointer() const { return BasePointer; }		const SCEV *getBasePointer() const { return BasePointer; }
size_t getNumSubscripts() const { return Subscripts.size(); }		size_t getNumSubscripts() const { return Subscripts.size(); }
		const SCEV getStepRecurrence(const Loop L) const {
		return StepRecurrences.lookup(L);
		}
		// Set up the mapping between loops and array access strides for the memory
		// reference. For example,
		// void foo(long n, long m, long o, int A[n][m][o]) {
		// for (long j = 0; j < m; j++)
		// for (long i = 0; i < n; i++)
		// for (long k = 0; k < o; k++)
		// A[2i+3][2j-4][2*k+7] = 1;
		// }
		// Loop 'for.i' maps to stride = 8mo,
		// Loop 'for.j' maps to stride = 8*o,
		// Loop 'for.k' maps to stride = 8.
		// If m and o are not known constants, we will try to provide estimated
		// strides later.
		void setStepRecurrence(const Loop L, const SCEV S) {
		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(S);
		StepRecurrences[L] = AR == nullptr ? nullptr : AR->getStepRecurrence(SE);
		return;
		}
const SCEV *getSubscript(unsigned SubNum) const {		const SCEV *getSubscript(unsigned SubNum) const {
assert(SubNum < getNumSubscripts() && "Invalid subscript number");		assert(SubNum < getNumSubscripts() && "Invalid subscript number");
return Subscripts[SubNum];		return Subscripts[SubNum];
}		}
const SCEV *getFirstSubscript() const {		const SCEV *getFirstSubscript() const {
assert(!Subscripts.empty() && "Expecting non-empty container");		assert(!Subscripts.empty() && "Expecting non-empty container");
return Subscripts.front();		return Subscripts.front();
}		}
Show All 21 Lines	public:
/// considered in the innermost position in the loop nest.		/// considered in the innermost position in the loop nest.
/// The cost is defined as:		/// The cost is defined as:
/// - equal to one if the reference is loop invariant, or		/// - equal to one if the reference is loop invariant, or
/// - equal to '(TripCount * stride) / cache_line_size' if:		/// - equal to '(TripCount * stride) / cache_line_size' if:
/// + the reference stride is less than the cache line size, and		/// + the reference stride is less than the cache line size, and
/// + the coefficient of this loop's index variable used in all other		/// + the coefficient of this loop's index variable used in all other
/// subscripts is zero		/// subscripts is zero
/// - or otherwise equal to 'TripCount'.		/// - or otherwise equal to 'TripCount'.
CacheCostTy computeRefCost(const Loop &L, unsigned CLS) const;		CacheCostTy
		computeRefCost(const Loop &L,
		const DenseMap<const SCEV *, unsigned> &EstimatedTripCounts,
		unsigned CLS) const;

private:		private:
/// Attempt to delinearize the indexed reference.		/// Attempt to delinearize the indexed reference.
bool delinearize(const LoopInfo &LI);		bool delinearize(const LoopInfo &LI);

/// Return true if the index reference is invariant with respect to loop \p L.		/// Return true if the index reference is invariant with respect to loop \p L.
bool isLoopInvariant(const Loop &L) const;		bool isLoopInvariant(const Loop &L) const;

Show All 31 Lines	private:

/// The subscript (indexes) of the memory reference.		/// The subscript (indexes) of the memory reference.
SmallVector<const SCEV *, 3> Subscripts;		SmallVector<const SCEV *, 3> Subscripts;

/// The dimensions of the memory reference.		/// The dimensions of the memory reference.
SmallVector<const SCEV *, 3> Sizes;		SmallVector<const SCEV *, 3> Sizes;

ScalarEvolution &SE;		ScalarEvolution &SE;

		// The mapping between loops and array access strides for the memory reference.
		SmallDenseMap<const Loop , const SCEV > StepRecurrences;
};		};

/// A reference group represents a set of memory references that exhibit		/// A reference group represents a set of memory references that exhibit
/// temporal or spacial reuse. Two references belong to the same		/// temporal or spacial reuse. Two references belong to the same
/// reference group with respect to a inner loop L iff:		/// reference group with respect to a inner loop L iff:
/// 1. they have a loop independent dependency, or		/// 1. they have a loop independent dependency, or
/// 2. they have a loop carried dependence with a small dependence distance		/// 2. they have a loop carried dependence with a small dependence distance
/// (e.g. less than 2) carried by the inner loop, or		/// (e.g. less than 2) carried by the inner loop, or
Show All 23 Lines
/// - equal to the innermost loop trip count if the reference stride is greater		/// - equal to the innermost loop trip count if the reference stride is greater
/// or equal to the cache line size CLS.		/// or equal to the cache line size CLS.
class CacheCost {		class CacheCost {
friend raw_ostream &operator<<(raw_ostream &OS, const CacheCost &CC);		friend raw_ostream &operator<<(raw_ostream &OS, const CacheCost &CC);
using LoopTripCountTy = std::pair<const Loop *, unsigned>;		using LoopTripCountTy = std::pair<const Loop *, unsigned>;
using LoopCacheCostTy = std::pair<const Loop *, CacheCostTy>;		using LoopCacheCostTy = std::pair<const Loop *, CacheCostTy>;

public:		public:
static CacheCostTy constexpr InvalidCost = -1;		static CacheCostTy constexpr InvalidCost = {-1, -1};

/// Construct a CacheCost object for the loop nest described by \p Loops.		/// Construct a CacheCost object for the loop nest described by \p Loops.
/// The optional parameter \p TRT can be used to specify the max. distance		/// The optional parameter \p TRT can be used to specify the max. distance
/// between array elements accessed in a loop so that the elements are		/// between array elements accessed in a loop so that the elements are
/// classified to have temporal reuse.		/// classified to have temporal reuse.
CacheCost(const LoopVectorTy &Loops, const LoopInfo &LI, ScalarEvolution &SE,		CacheCost(const LoopVectorTy &Loops, const LoopInfo &LI, ScalarEvolution &SE,
TargetTransformInfo &TTI, AAResults &AA, DependenceInfo &DI,		TargetTransformInfo &TTI, AAResults &AA, DependenceInfo &DI,
Optional<unsigned> TRT = None);		Optional<unsigned> TRT = None);

/// Create a CacheCost for the loop nest rooted by \p Root.		/// Create a CacheCost for the loop nest rooted by \p Root.
/// The optional parameter \p TRT can be used to specify the max. distance		/// The optional parameter \p TRT can be used to specify the max. distance
/// between array elements accessed in a loop so that the elements are		/// between array elements accessed in a loop so that the elements are
/// classified to have temporal reuse.		/// classified to have temporal reuse.
static std::unique_ptr<CacheCost>		static std::unique_ptr<CacheCost>
getCacheCost(Loop &Root, LoopStandardAnalysisResults &AR, DependenceInfo &DI,		getCacheCost(Loop &Root, LoopStandardAnalysisResults &AR, DependenceInfo &DI,
Optional<unsigned> TRT = None);		Optional<unsigned> TRT = None);

/// Return the estimated cost of loop \p L if the given loop is part of the		/// Return the estimated cost of loop \p L if the given loop is part of the
/// loop nest associated with this object. Return -1 otherwise.		/// loop nest associated with this object. Return -1 otherwise.
CacheCostTy getLoopCost(const Loop &L) const {		CacheCostTy getLoopCost(const Loop &L) const {
auto IT = llvm::find_if(LoopCosts, [&L](const LoopCacheCostTy &LCC) {		auto IT = llvm::find_if(LoopCosts, [&L](const LoopCacheCostTy &LCC) {
return LCC.first == &L;		return LCC.first == &L;
});		});
return (IT != LoopCosts.end()) ? (*IT).second : -1;		if (IT != LoopCosts.end())
		return (*IT).second;

		return {-1, -1};
}		}

/// Return the estimated ordered loop costs.		/// Return the estimated ordered loop costs.
ArrayRef<LoopCacheCostTy> getLoopCosts() const { return LoopCosts; }		ArrayRef<LoopCacheCostTy> getLoopCosts() const { return LoopCosts; }

private:		private:
/// Calculate the cache footprint of each loop in the nest (when it is		/// Calculate the cache footprint of each loop in the nest (when it is
/// considered to be in the innermost position).		/// considered to be in the innermost position).
Show All 17 Lines	private:
/// - equal to '(TripCount * stride) / cache_line_size' if (a) loop \p L's		/// - equal to '(TripCount * stride) / cache_line_size' if (a) loop \p L's
/// induction variable is used only in the reference subscript associated		/// induction variable is used only in the reference subscript associated
/// with loop \p L, and (b) the reference stride is less than the cache		/// with loop \p L, and (b) the reference stride is less than the cache
/// line size, or		/// line size, or
/// - TripCount otherwise		/// - TripCount otherwise
CacheCostTy computeRefGroupCacheCost(const ReferenceGroupTy &RG,		CacheCostTy computeRefGroupCacheCost(const ReferenceGroupTy &RG,
const Loop &L) const;		const Loop &L) const;

/// Sort the LoopCosts vector by decreasing cache cost.		/// Sort the LoopCosts vector by decreasing cache cost, and decreasing stride
		/// if cache costs are the same.
void sortLoopCosts() {		void sortLoopCosts() {
sort(LoopCosts, [](const LoopCacheCostTy &A, const LoopCacheCostTy &B) {		sort(LoopCosts, [this](const LoopCacheCostTy &A, const LoopCacheCostTy &B) {
return A.second > B.second;		const CacheCostTy &CostA = A.second;
		const CacheCostTy &CostB = B.second;
		if (CostA.first == CostA.first && CostA.second != InvalidCost.second &&
		CostB.second != InvalidCost.second) {
		return CostA.second > CostB.second;
		}
		return CostA.first > CostB.first;
});		});
}		}

private:		private:
/// Loops in the loop nest associated with this object.		/// Loops in the loop nest associated with this object.
LoopVectorTy Loops;		LoopVectorTy Loops;

/// Trip counts for the loops in the loop nest associated with this object.		/// Trip counts for the loops in the loop nest associated with this object.
SmallVector<LoopTripCountTy, 3> TripCounts;		SmallVector<LoopTripCountTy, 3> TripCounts;

		/// Mapping between the actual strides and the estimated strides
		DenseMap<const SCEV *, unsigned> EstimatedTripCounts;

/// Cache costs for the loops in the loop nest associated with this object.		/// Cache costs for the loops in the loop nest associated with this object.
SmallVector<LoopCacheCostTy, 3> LoopCosts;		SmallVector<LoopCacheCostTy, 3> LoopCosts;

/// The max. distance between array elements accessed in a loop so that the		/// The max. distance between array elements accessed in a loop so that the
/// elements are classified to have temporal reuse.		/// elements are classified to have temporal reuse.
Optional<unsigned> TRT;		Optional<unsigned> TRT;

const LoopInfo &LI;		const LoopInfo &LI;
Show All 23 Lines

llvm/lib/Analysis/LoopCacheAnalysis.cpp

Show All 27 Lines
#include "llvm/Analysis/LoopCacheAnalysis.h"		#include "llvm/Analysis/LoopCacheAnalysis.h"
#include "llvm/ADT/BreadthFirstIterator.h"		#include "llvm/ADT/BreadthFirstIterator.h"
#include "llvm/ADT/Sequence.h"		#include "llvm/ADT/Sequence.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/Delinearization.h"		#include "llvm/Analysis/Delinearization.h"
#include "llvm/Analysis/DependenceAnalysis.h"		#include "llvm/Analysis/DependenceAnalysis.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-cache-cost"		#define DEBUG_TYPE "loop-cache-cost"

▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
static const SCEV *computeTripCount(const Loop &L, ScalarEvolution &SE) {		static const SCEV *computeTripCount(const Loop &L, ScalarEvolution &SE) {
const SCEV *BackedgeTakenCount = SE.getBackedgeTakenCount(&L);		const SCEV *BackedgeTakenCount = SE.getBackedgeTakenCount(&L);
if (isa<SCEVCouldNotCompute>(BackedgeTakenCount) \|\|		if (isa<SCEVCouldNotCompute>(BackedgeTakenCount) \|\|
!isa<SCEVConstant>(BackedgeTakenCount))		!isa<SCEVConstant>(BackedgeTakenCount))
return nullptr;		return nullptr;
return SE.getTripCountFromExitCount(BackedgeTakenCount);		return SE.getTripCountFromExitCount(BackedgeTakenCount);
}		}

		/// Estimate a constant integer stride for a memory reference given its actual
		/// \p Stride. This would facilitate sorting of memory references (based on
		/// costs and strides). Return \p true if we are able to get an estimated
		/// integer stride.
		static bool
		getEstimatedStride(const SCEV *Stride,
		const DenseMap<const SCEV *, unsigned> &EstimatedTripCounts,
		int64_t &EstimatedStride) {
		auto MulStride = dyn_cast<SCEVMulExpr>(Stride);
		if (MulStride == nullptr)
		return false;

		for (unsigned i = 0; i < MulStride->getNumOperands(); i++) {
		const auto *Op = MulStride->getOperand(i);
		if (auto ConstantOp = dyn_cast<SCEVConstant>(Op)) {
		EstimatedStride *= ConstantOp->getValue()->getSExtValue();
		continue;
		} else if (EstimatedTripCounts.find(Op) != EstimatedTripCounts.end()) {
		EstimatedStride *= EstimatedTripCounts.lookup(Op);
		continue;
		} else {
		EstimatedStride = -1;
		break;
		}
		}
		return EstimatedStride != -1 ? true : false;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// IndexedReference implementation		// IndexedReference implementation
//		//
raw_ostream &llvm::operator<<(raw_ostream &OS, const IndexedReference &R) {		raw_ostream &llvm::operator<<(raw_ostream &OS, const IndexedReference &R) {
if (!R.IsValid) {		if (!R.IsValid) {
OS << R.StoreOrLoadInst;		OS << R.StoreOrLoadInst;
OS << ", IsValid=false.";		OS << ", IsValid=false.";
return OS;		return OS;
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	if (Level != LoopDepth && !CI.isZero()) {
return false;		return false;
}		}
}		}

LLVM_DEBUG(dbgs().indent(2) << "Found temporal reuse\n");		LLVM_DEBUG(dbgs().indent(2) << "Found temporal reuse\n");
return true;		return true;
}		}

CacheCostTy IndexedReference::computeRefCost(const Loop &L,		CacheCostTy IndexedReference::computeRefCost(
		const Loop &L, const DenseMap<const SCEV *, unsigned> &EstimatedTripCounts,
unsigned CLS) const {		unsigned CLS) const {
assert(IsValid && "Expecting a valid reference");		assert(IsValid && "Expecting a valid reference");
LLVM_DEBUG({		LLVM_DEBUG({
dbgs().indent(2) << "Computing cache cost for:\n";		dbgs().indent(2) << "Computing cache cost for:\n";
dbgs().indent(4) << *this << "\n";		dbgs().indent(4) << *this << "\n";
});		});

// If the indexed reference is loop invariant the cost is one.		// If the indexed reference is loop invariant the cost is one
		// and the stride is zero.
if (isLoopInvariant(L)) {		if (isLoopInvariant(L)) {
LLVM_DEBUG(dbgs().indent(4) << "Reference is loop invariant: RefCost=1\n");		LLVM_DEBUG(dbgs().indent(4) << "Reference is loop invariant: RefCost=1\n");
return 1;		return {1, 0};
}		}

const SCEV *TripCount = computeTripCount(L, SE);		const SCEV *TripCount = computeTripCount(L, SE);
if (!TripCount) {		if (!TripCount) {
LLVM_DEBUG(dbgs() << "Trip count of loop " << L.getName()		LLVM_DEBUG(dbgs() << "Trip count of loop " << L.getName()
<< " could not be computed, using DefaultTripCount\n");		<< " could not be computed, using DefaultTripCount\n");
const SCEV *ElemSize = Sizes.back();		const SCEV *ElemSize = Sizes.back();
TripCount = SE.getConstant(ElemSize->getType(), DefaultTripCount);		TripCount = SE.getConstant(ElemSize->getType(), DefaultTripCount);
}		}
LLVM_DEBUG(dbgs() << "TripCount=" << *TripCount << "\n");		LLVM_DEBUG(dbgs() << "TripCount=" << *TripCount << "\n");

// If the indexed reference is 'consecutive' the cost is		// If the indexed reference is 'consecutive' the cost is
// (TripCount*Stride)/CLS, otherwise the cost is TripCount.		// (TripCount*Stride)/CLS, otherwise the cost is TripCount.
const SCEV *RefCost = TripCount;		const SCEV *RefCost = TripCount;
		const SCEV *Stride = nullptr;
if (isConsecutive(L, CLS)) {		if (isConsecutive(L, CLS)) {
const SCEV *Coeff = getLastCoefficient();		const SCEV *Coeff = getLastCoefficient();
const SCEV *ElemSize = Sizes.back();		const SCEV *ElemSize = Sizes.back();
const SCEV *Stride = SE.getMulExpr(Coeff, ElemSize);		Stride = SE.getMulExpr(Coeff, ElemSize);
Type *WiderType = SE.getWiderType(Stride->getType(), TripCount->getType());		Type *WiderType = SE.getWiderType(Stride->getType(), TripCount->getType());
const SCEV *CacheLineSize = SE.getConstant(WiderType, CLS);		const SCEV *CacheLineSize = SE.getConstant(WiderType, CLS);
if (SE.isKnownNegative(Stride))		if (SE.isKnownNegative(Stride))
Stride = SE.getNegativeSCEV(Stride);		Stride = SE.getNegativeSCEV(Stride);
Stride = SE.getNoopOrAnyExtend(Stride, WiderType);		Stride = SE.getNoopOrAnyExtend(Stride, WiderType);
TripCount = SE.getNoopOrAnyExtend(TripCount, WiderType);		TripCount = SE.getNoopOrAnyExtend(TripCount, WiderType);
const SCEV *Numerator = SE.getMulExpr(Stride, TripCount);		const SCEV *Numerator = SE.getMulExpr(Stride, TripCount);
RefCost = SE.getUDivExpr(Numerator, CacheLineSize);		RefCost = SE.getUDivExpr(Numerator, CacheLineSize);

LLVM_DEBUG(dbgs().indent(4)		LLVM_DEBUG(dbgs().indent(4)
<< "Access is consecutive: RefCost=(TripCount*Stride)/CLS="		<< "Access is consecutive: RefCost=(TripCount*Stride)/CLS="
<< *RefCost << "\n");		<< *RefCost << "\n");
} else		} else {
		Stride = getStepRecurrence(&L);
LLVM_DEBUG(dbgs().indent(4)		LLVM_DEBUG(dbgs().indent(4)
<< "Access is not consecutive: RefCost=TripCount=" << *RefCost		<< "Access is not consecutive: RefCost=TripCount=" << *RefCost
<< "\n");		<< "\n");
		}

// Attempt to fold RefCost into a constant.		// Attempt to fold RefCost into a constant.
if (auto ConstantCost = dyn_cast<SCEVConstant>(RefCost))		if (auto ConstantCost = dyn_cast<SCEVConstant>(RefCost)) {
return ConstantCost->getValue()->getSExtValue();		if (auto ConstantStride = dyn_cast<SCEVConstant>(Stride))
		return {ConstantCost->getValue()->getSExtValue(),
		ConstantStride->getValue()->getSExtValue()};
		else {
		// For most parametric-sized loops the strides and trip counts
		// are not known constants. Try to estimate the strides using the
		// estimated trip counts.
		int64_t EstimatedStride = 1;
		if (getEstimatedStride(Stride, EstimatedTripCounts, EstimatedStride))
		return {ConstantCost->getValue()->getSExtValue(), EstimatedStride};
		return {ConstantCost->getValue()->getSExtValue(), -1};
		}
		}

LLVM_DEBUG(dbgs().indent(4)		LLVM_DEBUG(dbgs().indent(4)
<< "RefCost is not a constant! Setting to RefCost=InvalidCost "		<< "RefCost is not a constant! Setting to RefCost=InvalidCost "
"(invalid value).\n");		"(invalid value).\n");

return CacheCost::InvalidCost;		return CacheCost::InvalidCost;
}		}

bool IndexedReference::delinearize(const LoopInfo &LI) {		bool IndexedReference::delinearize(const LoopInfo &LI) {
assert(Subscripts.empty() && "Subscripts should be empty");		assert(Subscripts.empty() && "Subscripts should be empty");
assert(Sizes.empty() && "Sizes should be empty");		assert(Sizes.empty() && "Sizes should be empty");
assert(!IsValid && "Should be called once from the constructor");		assert(!IsValid && "Should be called once from the constructor");
LLVM_DEBUG(dbgs() << "Delinearizing: " << StoreOrLoadInst << "\n");		LLVM_DEBUG(dbgs() << "Delinearizing: " << StoreOrLoadInst << "\n");

const SCEV *ElemSize = SE.getElementSize(&StoreOrLoadInst);		const SCEV *ElemSize = SE.getElementSize(&StoreOrLoadInst);
const BasicBlock *BB = StoreOrLoadInst.getParent();		const BasicBlock *BB = StoreOrLoadInst.getParent();

if (Loop *L = LI.getLoopFor(BB)) {		if (Loop *L = LI.getLoopFor(BB)) {
const SCEV *AccessFn =		const SCEV *AccessFn =
SE.getSCEVAtScope(getPointerOperand(&StoreOrLoadInst), L);		SE.getSCEVAtScope(getPointerOperand(&StoreOrLoadInst), L);

		// Populate the access stride for each loop.
		Loop *CurLoop = L;
		while (CurLoop) {
		const SCEV *CurAccessFn =
		SE.getSCEVAtScope(getPointerOperand(&StoreOrLoadInst), CurLoop);
		setStepRecurrence(CurLoop, CurAccessFn);
		CurLoop = CurLoop->getParentLoop();
		}

BasePointer = dyn_cast<SCEVUnknown>(SE.getPointerBase(AccessFn));		BasePointer = dyn_cast<SCEVUnknown>(SE.getPointerBase(AccessFn));
if (BasePointer == nullptr) {		if (BasePointer == nullptr) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs().indent(2)		dbgs().indent(2)
<< "ERROR: failed to delinearize, can't identify base pointer\n");		<< "ERROR: failed to delinearize, can't identify base pointer\n");
return false;		return false;
}		}

▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	bool IndexedReference::isAliased(const IndexedReference &Other,
const auto &Loc1 = MemoryLocation::get(&StoreOrLoadInst);		const auto &Loc1 = MemoryLocation::get(&StoreOrLoadInst);
const auto &Loc2 = MemoryLocation::get(&Other.StoreOrLoadInst);		const auto &Loc2 = MemoryLocation::get(&Other.StoreOrLoadInst);
return AA.isMustAlias(Loc1, Loc2);		return AA.isMustAlias(Loc1, Loc2);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// CacheCost implementation		// CacheCost implementation
//		//
		CacheCostTy constexpr CacheCost::InvalidCost;

raw_ostream &llvm::operator<<(raw_ostream &OS, const CacheCost &CC) {		raw_ostream &llvm::operator<<(raw_ostream &OS, const CacheCost &CC) {
for (const auto &LC : CC.LoopCosts) {		for (const auto &LC : CC.LoopCosts) {
const Loop *L = LC.first;		const Loop *L = LC.first;
OS << "Loop '" << L->getName() << "' has cost = " << LC.second << "\n";		OS << "Loop '" << L->getName() << "' has cost = " << LC.second.first;
		if (LC.second.second != -1)
		OS << " and stride = " << LC.second.second << " \n";
}		}
return OS;		return OS;
}		}

CacheCost::CacheCost(const LoopVectorTy &Loops, const LoopInfo &LI,		CacheCost::CacheCost(const LoopVectorTy &Loops, const LoopInfo &LI,
ScalarEvolution &SE, TargetTransformInfo &TTI,		ScalarEvolution &SE, TargetTransformInfo &TTI,
AAResults &AA, DependenceInfo &DI, Optional<unsigned> TRT)		AAResults &AA, DependenceInfo &DI, Optional<unsigned> TRT)
: Loops(Loops),		: Loops(Loops),
TRT((TRT == None) ? Optional<unsigned>(TemporalReuseThreshold) : TRT),		TRT((TRT == None) ? Optional<unsigned>(TemporalReuseThreshold) : TRT),
LI(LI), SE(SE), TTI(TTI), AA(AA), DI(DI) {		LI(LI), SE(SE), TTI(TTI), AA(AA), DI(DI) {
assert(!Loops.empty() && "Expecting a non-empty loop vector.");		assert(!Loops.empty() && "Expecting a non-empty loop vector.");

for (const Loop *L : Loops) {		for (const Loop *L : Loops) {
unsigned TripCount = SE.getSmallConstantTripCount(L);		unsigned TripCount = SE.getSmallConstantTripCount(L);
TripCount = (TripCount == 0) ? DefaultTripCount : TripCount;		TripCount = (TripCount == 0) ? DefaultTripCount : TripCount;
TripCounts.push_back({L, TripCount});		TripCounts.push_back({L, TripCount});
		// Keep track of the mapping between the actual (possibly non-constant) trip
		// counts and the estimated trip counts, i.e., DefaultTripCount. This will
		// be useful later when estimating the strides for parametric-sized arrays.
		const SCEV *S = SE.getTripCountFromExitCount(
		SE.getExitCount(L, L->getExitingBlock()), /Extend=/false);
		EstimatedTripCounts[S] = TripCount;
}		}

calculateCacheFootprint();		calculateCacheFootprint();
}		}

std::unique_ptr<CacheCost>		std::unique_ptr<CacheCost>
CacheCost::getCacheCost(Loop &Root, LoopStandardAnalysisResults &AR,		CacheCost::getCacheCost(Loop &Root, LoopStandardAnalysisResults &AR,
DependenceInfo &DI, Optional<unsigned> TRT) {		DependenceInfo &DI, Optional<unsigned> TRT) {
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	CacheCost::computeLoopCacheCost(const Loop &L,
const ReferenceGroupsTy &RefGroups) const {		const ReferenceGroupsTy &RefGroups) const {
if (!L.isLoopSimplifyForm())		if (!L.isLoopSimplifyForm())
return InvalidCost;		return InvalidCost;

LLVM_DEBUG(dbgs() << "Considering loop '" << L.getName()		LLVM_DEBUG(dbgs() << "Considering loop '" << L.getName()
<< "' as innermost loop.\n");		<< "' as innermost loop.\n");

// Compute the product of the trip counts of each other loop in the nest.		// Compute the product of the trip counts of each other loop in the nest.
CacheCostTy TripCountsProduct = 1;		uint64_t TripCountsProduct = 1;
for (const auto &TC : TripCounts) {		for (const auto &TC : TripCounts) {
if (TC.first == &L)		if (TC.first == &L)
continue;		continue;
TripCountsProduct *= TC.second;		TripCountsProduct *= TC.second;
}		}

CacheCostTy LoopCost = 0;		CacheCostTy LoopCost = {0, 0};
for (const ReferenceGroupTy &RG : RefGroups) {		for (const ReferenceGroupTy &RG : RefGroups) {
CacheCostTy RefGroupCost = computeRefGroupCacheCost(RG, L);		CacheCostTy RefGroupCost = computeRefGroupCacheCost(RG, L);
LoopCost += RefGroupCost * TripCountsProduct;		LoopCost.first += RefGroupCost.first * TripCountsProduct;
		LoopCost.second = std::max(LoopCost.second, RefGroupCost.second);
}		}

LLVM_DEBUG(dbgs().indent(2) << "Loop '" << L.getName()		LLVM_DEBUG(dbgs().indent(2)
<< "' has cost=" << LoopCost << "\n");		<< "Loop '" << L.getName() << "' has cost=" << LoopCost.first
		<< " and stride= " << LoopCost.second << " \n");

return LoopCost;		return LoopCost;
}		}

CacheCostTy CacheCost::computeRefGroupCacheCost(const ReferenceGroupTy &RG,		CacheCostTy CacheCost::computeRefGroupCacheCost(const ReferenceGroupTy &RG,
const Loop &L) const {		const Loop &L) const {
assert(!RG.empty() && "Reference group should have at least one member.");		assert(!RG.empty() && "Reference group should have at least one member.");

const IndexedReference *Representative = RG.front().get();		const IndexedReference *Representative = RG.front().get();
return Representative->computeRefCost(L, TTI.getCacheLineSize());		return Representative->computeRefCost(L, EstimatedTripCounts,
		TTI.getCacheLineSize());
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// LoopCachePrinterPass implementation		// LoopCachePrinterPass implementation
//		//
PreservedAnalyses LoopCachePrinterPass::run(Loop &L, LoopAnalysisManager &AM,		PreservedAnalyses LoopCachePrinterPass::run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,		LoopStandardAnalysisResults &AR,
LPMUpdater &U) {		LPMUpdater &U) {
Function *F = L.getHeader()->getParent();		Function *F = L.getHeader()->getParent();
DependenceInfo DI(F, &AR.AA, &AR.SE, &AR.LI);		DependenceInfo DI(F, &AR.AA, &AR.SE, &AR.LI);

if (auto CC = CacheCost::getCacheCost(L, AR, DI))		if (auto CC = CacheCost::getCacheCost(L, AR, DI))
OS << *CC;		OS << *CC;

return PreservedAnalyses::all();		return PreservedAnalyses::all();
}		}

llvm/test/Analysis/LoopCacheAnalysis/PowerPC/single-store.ll

	Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines

	for.end.loopexit: ; preds = %for.inci			for.end.loopexit: ; preds = %for.inci
	br label %for.end			br label %for.end

	for.end: ; preds = %for.end.loopexit, %for.cond1.preheader.lr.ph, %entry			for.end: ; preds = %for.end.loopexit, %for.cond1.preheader.lr.ph, %entry
	ret void			ret void
	}			}

				; void foo(long n, long m, long o, int A[n][m][o]) {
				bmahjourUnsubmitted Not Done Reply Inline Actions Could you please make it more clear how this test is different from the one above (ie add a comment to say that this is testing to make sure the analysis prints the loops in the expected order despite the original loop nest being in suboptimal order)? bmahjour: Could you please make it more clear how this test is different from the one above (ie add a…
				; for (long j = 0; j < m; j++)
				; for (long i = 0; i < n; i++)
				; for (long k = 0; k < o; k++)
				; A[2i+3][2j-4][2*k+7] = 1;
				; }

				; CHECK-DAG: Loop 'for.i' has cost = 1000000 and stride = 80000
				; CHECK-DAG: Loop 'for.j' has cost = 1000000 and stride = 800
				; CHECK-DAG: Loop 'for.k' has cost = 60000 and stride = 8

				define void @foo2(i64 %n, i64 %m, i64 %o, i32* %A) {
				entry:
				bmahjourUnsubmitted Not Done Reply Inline Actions Use CHECK and CHECK-NEXT bmahjour: Use CHECK and CHECK-NEXT
				%cmp32 = icmp sgt i64 %n, 0
				%cmp230 = icmp sgt i64 %m, 0
				%cmp528 = icmp sgt i64 %o, 0
				br i1 %cmp32, label %for.cond1.preheader.lr.ph, label %for.end

				for.cond1.preheader.lr.ph: ; preds = %entry
				br i1 %cmp230, label %for.j.preheader, label %for.end

				for.j.preheader: ; preds = %for.cond1.preheader.lr.ph
				br i1 %cmp528, label %for.j.preheader.split, label %for.end

				for.j.preheader.split: ; preds = %for.j.preheader
				br label %for.j

				for.i: ; preds = %for.inci, %for.j
				%i = phi i64 [ %inci, %for.inci ], [ 0, %for.j ]
				%mul8 = shl i64 %i, 1
				%add9 = add nsw i64 %mul8, 3
				%0 = mul i64 %add9, %m
				%sub = add i64 %0, -4
				%mul7 = mul nsw i64 %j, 2
				%tmp = add i64 %sub, %mul7
				%tmp27 = mul i64 %tmp, %o
				br label %for.k

				for.j: ; preds = %for.incj, %for.j.preheader.split
				%j = phi i64 [ %incj, %for.incj ], [ 0, %for.j.preheader.split ]
				br label %for.i

				for.k: ; preds = %for.k, %for.i
				%k = phi i64 [ 0, %for.i ], [ %inck, %for.k ]

				%mul = mul nsw i64 %k, 2
				%arrayidx.sum = add i64 %mul, 7
				%arrayidx10.sum = add i64 %arrayidx.sum, %tmp27
				%arrayidx11 = getelementptr inbounds i32, i32* %A, i64 %arrayidx10.sum
				store i32 1, i32* %arrayidx11, align 4

				%inck = add nsw i64 %k, 1
				%exitcond.us = icmp eq i64 %inck, %o
				br i1 %exitcond.us, label %for.inci, label %for.k

				for.incj: ; preds = %for.inci
				%incj = add nsw i64 %j, 1
				%exitcond54.us = icmp eq i64 %incj, %m
				br i1 %exitcond54.us, label %for.end.loopexit, label %for.j

				for.inci: ; preds = %for.k
				%inci = add nsw i64 %i, 1
				%exitcond55.us = icmp eq i64 %inci, %n
				br i1 %exitcond55.us, label %for.incj, label %for.i

				for.end.loopexit: ; preds = %for.incj
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %for.cond1.preheader.lr.ph, %entry
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[NFC][LoopCacheAnalysis] Add a motivating test case for improved loop cache analysis cost calculationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 419304

llvm/include/llvm/Analysis/LoopCacheAnalysis.h

llvm/lib/Analysis/LoopCacheAnalysis.cpp

llvm/test/Analysis/LoopCacheAnalysis/PowerPC/single-store.ll

[NFC][LoopCacheAnalysis] Add a motivating test case for improved loop cache analysis cost calculation
ClosedPublic