This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
17/17
LoopCacheAnalysis.h
-
lib/
-
Analysis/
-
CMakeLists.txt
55/56
LoopCacheAnalysis.cpp
-
Passes/
-
PassBuilder.cpp
-
PassRegistry.def
-
test/Analysis/LoopCacheAnalysis/
-
Analysis/
-
LoopCacheAnalysis/
1/2
loads-store.ll
-
matmul.ll
-
matvecmul.ll
-
single-store.ll
-
stencil.ll

Differential D63459

Loop Cache Analysis
ClosedPublic

Authored by etiotto on Jun 17 2019, 2:41 PM.

Download Raw Diff

Details

Reviewers

hfinkel
Meinersbur
jdoerfert
kbarton
bmahjour
anemet
fhahn

Commits

rL368624: Title: Fix build warning for operator<< when using GCC 7.
rGdd3b6498b016: Title: Loop Cache Analysis Summary: Implement a new analysis to estimate the…
rL368439: Title: Loop Cache Analysis

Summary

Implement a new analysis to estimate the number of cache lines required by a loop nest.
The analysis is largely based on the following paper:

Compiler Optimizations for Improving Data Locality
By: Steve Carr, Katherine S. McKinley, Chau-Wen Tseng
http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf

The analysis considers temporal reuse (accesses to the same memory location) and spatial reuse (accesses to memory locations within a cache line). For simplicity the analysis considers memory accesses in the innermost loop in a loop nest, and thus determines the number of cache lines used when the loop L in loop nest LN is placed in the innermost position.

The result of the analysis can be used to drive several transformations. As an example, loop interchange could use it determine which loops in a perfect loop nest should be interchanged to maximize cache reuse. Similarly, loop distribution could be enhanced to take into consideration cache reuse between arrays when distributing a loop to eliminate vectorization inhibiting dependencies.

The general approach taken to estimate the number of cache lines used by the memory references in the inner loop of a loop nest is:

Partition memory references that exhibit temporal or spatial reuse into reference groups.
For each loop L in the a loop nest LN: a. Compute the cost of the reference group b. Compute the 'cache cost' of the loop nest by summing up the reference groups costs

For further details of the algorithm please refer to the paper.

Diff Detail

Repository: rL LLVM

Event Timeline

etiotto created this revision.Jun 17 2019, 2:41 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2019, 2:41 PM

Herald added subscribers: llvm-commits, jsji, mgrang and 2 others. · View Herald Transcript

etiotto edited the summary of this revision. (Show Details)Jun 17 2019, 2:44 PM

What do you plan as the first use of this transformation?

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
31–32	[style] The current code base most often uses a `Ty` suffix for typedefs.
175	[suggestion] Make these magic numbers `cl::opt` options?
llvm/lib/Analysis/LoopCacheAnalysis.cpp
64–65	Is passing a single innermost loop a precondition or a what this function computes? It seems the only loop that this function might return is `Loops.back()` and kind-of verifies that the loop vector is in the right order?!?
308	[nit] double space
320	[style] `Subscripts.empty()`. `Sizes.empty()`
333	[style] Did you mean break/return here? The next iteration might add elements to it. I'd structure this as if (!isOneDimensionalArray(AccessFn, L)) { ... break; } ... conforming https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code.
337	[style] `return all_of(Subcripts, ...`
387	Why not using `getBackedgeTakenCount()`?
524–532	Since this only has code that executes in assert-builds, it should be guarded entirely by an `LLVM_DEBUG`.
591–592	If the intention is to provide analysis for innermost loops only, why this limitation? Could it just return an analysis result for each innermost loop? If the analysis requires a global view to determine the cost for each loop, wouldn't a FunctionPass be more appropriate? Currently, it seems users first need get the LoopCacheAnalysis for a topmost loops, the ask it for one of its nested loops. Are such loop nests not analyzable at all? while (repeat) { for (int i = ...) for (int j = ...) B[i][j] = ... A[i+1][j+1] ... // stencil for (int i = ...) for (int j = ...) A[i][j] = ... B[i+1][j+1] ... // stencil }
llvm/test/Analysis/LoopCacheAnalysis/a.ll
1 ↗	(On Diff #205185)	[serious] Can you give a more descriptive file names that `a.ll` and `b.ll` etc?
4 ↗	(On Diff #205185)	[suggestion] Is it possible to leave the triple unspecified to the test case?

Whitney added a subscriber: Whitney.Jun 18 2019, 6:20 AM

etiotto marked 16 inline comments as done.Jun 18 2019, 2:20 PM

etiotto added inline comments.

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
31–32	OK I'll use the Ty suffix.
175	Added cl::opt for the default loop trip count and the temporal reuse threshold (see the .cpp file)
llvm/lib/Analysis/LoopCacheAnalysis.cpp
64–65	This function attempts to retrieve the innermost loop in the given loop vector. It returns a nullptr if any loops in the loop vector supplied has more than one sibling. The loop vector is expected to contain loops collected in breadth-first order. I'll improve the comment.
387	I'll do that.
591–592	The current scope of this PR is to analyze loop nests that have a single innermost loop. The analysis returns a vector of loop costs for each loop in a loop nest. This is not a hard requirement, however I would like to extend the scope of the transformation in a future PR. One of the scenario I had in mind, at least initially, as a consumer of this analysis is loop interchange which operates on perfect nests and therefore the current implementation of the analysis is sufficient for that use. If we were to make the analysis a function pass that would force consumers to become function passes (or use the cached version of this analysis). That seems overly restrictive especially considering that loop interchange is currently a loop pass.
llvm/test/Analysis/LoopCacheAnalysis/a.ll
1 ↗	(On Diff #205185)	Ooops, yes I'll rename those files.
4 ↗	(On Diff #205185)	Unfortunately is not easily removed because the target triple is necessary to determine the target architecture cache line size, which is used in the analysis.

Addressed review comment from M. Kruse.

xbolva00 added a subscriber: xbolva00.Jun 18 2019, 2:52 PM

xbolva00 added inline comments.

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
220	llvm::sort
llvm/lib/Analysis/LoopCacheAnalysis.cpp
365	llvm::all_of

fhahn added a subscriber: fhahn.Jun 18 2019, 2:54 PM

fhahn added inline comments.

llvm/lib/Analysis/LoopCacheAnalysis.cpp
591–592	I'll try to have a closer look in a few days, but I think for the use in LoopInterchange, it would not need to be an analysis pass (I suppose there won't be a big benefit to preserve this as an analysis?). I think a lightweight interface to query the cost for certain valid permutations would be sufficient. I think it would be great if we only compute the information when directly required (i.e. we only need to compute the costs for loops we can interchange, for the permutations valid to interchange)

How would a pass use this analysis? It computes a cost for the current IR, but there is nothing to compare it to unless the transformation pass emits the transformed loop nest next to the original pass such that the LoopCacheAnalysis can compute its cost.

llvm/lib/Analysis/LoopCacheAnalysis.cpp
564	Would it be useful to make `CacheCostTy` an `int64_t`? At least with UBSan we could diagnose an overflow.

etiotto marked an inline comment as done.Jun 18 2019, 4:10 PM

etiotto added inline comments.

llvm/lib/Analysis/LoopCacheAnalysis.cpp
591–592	Looking forward to you comments. You are correct, we could teach loop interchange about the locality of accesses in a loop nest. There are several other loop transformations that can benefit from it. The paper that introduces the analysis (Compiler Optimizations for Improving Data Locality - http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf) illustrates how the analysis was used to guide several other loop transformation, namely loop reversal, loop fusion, and loop distribution. Given the applicability of the analysis to several transformation I think it would make sense to centralize the code as an analysis pass.

etiotto updated this revision to Diff 205570.Jun 19 2019, 6:37 AM

etiotto marked 3 inline comments as done.

etiotto marked 2 inline comments as done.Jun 19 2019, 6:42 AM

etiotto added inline comments.

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
220	Yup agree, thanks for pointing this out.
llvm/lib/Analysis/LoopCacheAnalysis.cpp
564	Ok.

In D63459#1549307, @Meinersbur wrote:

How would a pass use this analysis? It computes a cost for the current IR, but there is nothing to compare it to unless the transformation pass emits the transformed loop nest next to the original pass such that the LoopCacheAnalysis can compute its cost.

The result of the analysis will be a vector of costs for each loop in a loop nest. The cost associated with a loop estimates the number of cache line used if the loop was placed in the innermost position in the nest. Therefore sorting the cost vector in descending order correspond to minimizing the number of data cache misses in the nest. For example loop distribution can use the sorted vector and work out a set of permutation moves (assuming legality constraint are satisfied) to maximize cache locality.

steleman added a subscriber: steleman.Jun 19 2019, 10:56 AM

dmgreen added a subscriber: dmgreen.Jun 20 2019, 8:53 AM

greened added a subscriber: greened.Jun 20 2019, 1:20 PM

etiotto marked 3 inline comments as done.Jun 21 2019, 12:12 PM

venkataramanan.kumar.llvm added a subscriber: venkataramanan.kumar.llvm.Jun 26 2019, 4:02 AM

xusx595 added a subscriber: xusx595.Jul 2 2019, 1:48 AM

Hahnfeld added a subscriber: Hahnfeld.Jul 2 2019, 2:06 PM

ping

Herald added a subscriber: • wuzish. · View Herald TranscriptJul 4 2019, 7:09 AM

@fhahn Do you still want to look over this patch?

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
55	Should return `size_t`
159	It does not make sense to have owning pointers to SmallVector's. This just adds another level of indirection compared to containing `std::vector` (or `SmallVector` to have a single allocation for all elements if none exceeds the small size) directly.
llvm/lib/Analysis/LoopCacheAnalysis.cpp
52	Before `const_cast` why not removing the `const` from the parameter list?
56	`WL` is used as a stack, a (Small)Vector could do this as well.
103	Don’t “almost always” use auto for (const SCEV *Subscript : R.subscripts) You seem to have applied the coding standard for auto everywhere except in foreach-loops.
117	No need for `Subscripts(), Sizes()` Prefer default member initializer at declaration over `IsValid(false), BaserPointer(nullptr)`
121	[subjective] I'd prefer a function that either returns an `nullptr`/`ErrorOr<IndexedReference>` than an `IsValid` flag for ever object.
146	Nice, this is where the coding standard recommends `auto`.
173–176	Some compilers may complain about empty statements in non-debug builds. Alternative: LLVM_DEBUG({ if (InSameCacheLine) dbgs().indent(2) << "Found spacial reuse.\n"; else dbgs().indent(2) << "No spacial reuse.\n"; });
208	Prefer `int` over `unsigned`
499–500	The LLVM code base does not use `const` for stack variables.
538	Do you need wrapping behavior? `int` instead of `unsigned`.
591–592	If we were to make the analysis a function pass that would force consumers to become function passes (or use the cached version of this analysis). That seems overly restrictive especially considering that loop interchange is currently a loop pass. I don't think the new pass manager has this restriction. Passes can ask for analyses of any level using OuterAnalysisManagerProxy. The new pass manager caches everything.

etiotto marked 17 inline comments as done.Jul 9 2019, 8:05 AM

etiotto added inline comments.

llvm/lib/Analysis/LoopCacheAnalysis.cpp
52	I dropped the const qualifier in the parameter declaration and adjusted the code accordingly.
103	OK I'll fix the use of auto in foreach-loops.
146	Thanks.
173–176	Agree. Is cleaner the way you propose as there is only one LLVM_DEBUG.
499–500	const qualifying local variables preempts unintended modifications later in the function... but I do not feel strongly about it. I'll change it.
591–592	Yes it is also my understanding (from the comments in PassManager.h) that the new pass manager does allows an "inner" pass (e.g. a Loop Pass) to ask for an "outer" analysis (e.g. a Function Analysis). However, at least from the comments in PassManager.h, the inner pass cannot cause the outer analysis to run and can only rely on the cached version which may give back a nullptr.

Addressed Michael Kruse comments.

fhahn added inline comments.Jul 9 2019, 8:21 AM

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
220	Do you anticipate most users relying on the loops to be ordered by cost? If not, it might be worth switching to a map to avoid using `find_if` at multiple places (even though not too common in user-written C/C++ code, I've encountered various (auto-generated) sources, which have very deep nesting levels of loops). LoopInterhcange for example would be interested in comparing just the total cache cost of a various re-orderings of a loop nest.
llvm/lib/Analysis/LoopCacheAnalysis.cpp
591–592	. Given the applicability of the analysis to several transformation I think it would make sense to centralize the code as an analysis pass. I agree it is a useful utility to have, I am just wondering what the benefit of exposing it as an analysis pass would be, as it is unlikely that the result would be used for most loops or could be cached between interested transforms frequently. IMO it would be fine as a utility class/function, i.e. just provide a `getLoopCacheCost()` function that takes a root loop and maybe a potential re-ordering of loops, which the interested transforms can use only on the loops that they can transform. I think that would reduce the size of the patch a bit and focus on the important bits.

Meinersbur added inline comments.Jul 9 2019, 11:23 AM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
591–592	I convinced myself that indeed a FunctionPass might not be that nice, because any LoopPass would need to preserve it. As @fhahn points out, wanting to preserve/reuse the analysis might be rare.

etiotto marked 6 inline comments as done.Jul 9 2019, 4:14 PM

etiotto added inline comments.

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
220	I was envisioning Loop Interchange wanting to get a sorted vector of cache costs for a nest and then using it to determine the optimal reordering of the loop in the nest (loop with smaller cost in the innermost position, larger cost in the outermost position). Having the cache cost vector sorted is also handy when printing it.
llvm/lib/Analysis/LoopCacheAnalysis.cpp
48	Will remove this static function and instead use the breadth_first(&Root) iterator from ADT/BreadthFirstIterator.h
591–592	I was thinking that having it as an analysis pass would allow loop transformations that do not modify the memory references in a loop nest to preserve the analysis... I am OK with just adding a member function to the CacheCost class to construct and return the cache cost for a loop nested root by a given loop. I will upstream a path to make that change and for the time being avoid making this an analysis pass.
591–592	I will upstream a path as suggested by @fhahn to just provide a member function to the CacheCost class to construct and return the cache cost for a loop nested rooted by a given loop.

Addressed suggestions from @fhahn and dropped the pass as an analysis. Instead I provided a static member function in the CacheCost class to compute the cache cost of a nest rooted by a given loop.

Note: Because I removed the getLoops static function in the last path older comments in LoopCacheAnalysis.cpp are unfortunately no longer attached to the correct line :-(

@fhahn @Meinersbur do you have further comments? If not can this be approved?

LGTM, some nitpicks left.

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
182	`CacheCostTy` is `int64_t`; did you mean to assign `-1`?
201	To not leak the internal list implementation, return `ArrayRef<LoopCacheCostTy>`.
llvm/lib/Analysis/LoopCacheAnalysis.cpp
168	Don’t “almost always” use auto

This revision is now accepted and ready to land.Jul 17 2019, 1:21 PM

I would like to take another look, but do not want to block this for too long. Please feel free to commit if you don't hear from me by Monday.

llvm/test/Analysis/LoopCacheAnalysis/loads-store.ll
5	We need to make sure that the PPC backend was built when running those tests. usually we put target-specific tests in target subdirectories, with a local lit.local.cfg, checking for the backend (e.g. see llvm/test/Transforms/LoopVectorize/PowerPC/lit.local.cfg)

greened added inline comments.Jul 18 2019, 10:28 AM

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
77	Referencing a `cl:opt` defined in the `.cpp`? It's a little confusing.
108	Why is this part of `IndexedReference`?
llvm/lib/Analysis/LoopCacheAnalysis.cpp
45	What units is this in? What does "2" mean?
190	This seems a bit dangerous to me. Depending on the client, we might want to assume reuse if we don't know the distance. Could this function return a tribool of (yes/no/unknown)?

greened added inline comments.Jul 18 2019, 5:53 PM

llvm/test/Analysis/LoopCacheAnalysis/loads-store.ll
2	I'd like to see block comments at the top of all of these tests explaining what they are testing. It will be much easier to understand what's going on when these tests fail.

fhahn added inline comments.Jul 22 2019, 6:43 AM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
257	RefCost is guaranteed to be a SCEVConstant here, so it would be better to use cast<> instead. Or even better if (auto ConstantCost = dyn_cast<SCEVCOnstant>(RefCost)) return return ConstantCost->getValue()->getSExtValue(); LLVM_DEBUG(dbgs().indent(4) << "RefCost is not a constant! Setting to RefCost=InvalidCost " "(invalid value).\n"); return CacheCost::InvalidCost;
268	It looks like this function does not capture much and might be better as a separate member function?
275	nit: Capitalize start of sentence.
281	nit: Capitalize start of sentence.
457	\n at the end?
472	This should be passed in I think and the users should request it via the pass manager.

Addressing code review comments given by @Meinersbur and @fhahn.

Herald added subscribers: MaskRay, nemanjai. · View Herald TranscriptJul 24 2019, 6:33 AM

In D63459#1590173, @fhahn wrote:

I would like to take another look, but do not want to block this for too long. Please feel free to commit if you don't hear from me by Monday.

Done.

Address remaining comments from @fhahn

etiotto marked 10 inline comments as done.Jul 24 2019, 9:39 AM

etiotto added inline comments.

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
77	I'll pass it as an argument.
108	I changed into a static function in the .cpp file instead.
llvm/lib/Analysis/LoopCacheAnalysis.cpp
45	I clarified the description.
268	OK I'll make it a private member function.

etiotto updated this revision to Diff 211532.Jul 24 2019, 9:41 AM

etiotto marked 4 inline comments as done.

greened added inline comments.Jul 24 2019, 10:13 AM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
45	I'm still a bit confused. Does this mean the `a[i]` and `a[i+1]` will be considered to have temporal reuse with a value of `2`? Perhaps "temporal reuse" itself needs a definition. Are we talking about referencing the same exact memory location, the same cache line, the same page, ...? It seems odd to me that "temporal reuse" would mean anything other than accessing exactly the same memory location. Everything else I would consider to be "spacial reuse." At the very least this deserves a longer comment block about what it means and its implications. Some clients want the definition of "temporal reuse" to be "access the exact same memory location" and this default value seems to mean something very different.
240	The debug message should reference `MaxDistance` and not hard-code the value `2`.

etiotto marked 5 inline comments as done.Jul 24 2019, 2:42 PM

etiotto added inline comments.

llvm/lib/Analysis/LoopCacheAnalysis.cpp
45	That's correct, a[i] and a[i+1] are consider to have temporal reuse when the threshold is 2. A threshold of 1 would cause only references to the same memory location have temporal reuse. The analysis attempts to implement the algorithm in the paper mentioned in the summary, and I agree that this is a bit confusing. I will add a comment to attempt to clarify better.
190	If the dependence distance is unknown at compile time the references are conservatively considered to have no spacial reuse, and consequently the analysis will overestimate the number of cache lines used by the loop (when it is in the innermost position in the nest). For now I can return an Optional<bool> (didn't find a tribool data type readily available in LLVM). This will make it easier to place references with 'unknow' distance in the same reference group if we find a motivating test case that needs it.

Addressing review comment from @reames

I addressed all pending review comments, @fhahn @reames does it look ok to you guys now?

Can you re-run clang-format on the latest version of the patch? I think it would be good to get an in-tree user of this soon, to make sure the modeling works as expected on real hardware/benchmarks. Do you have a timeline to get this used? I think you mentioned LoopFusion as one of the first planned users?

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
199	IIRC this returns the ordered loop costs, right? Please document.
llvm/test/Analysis/LoopCacheAnalysis/PowerPC/matvecmul.ll
17 ↗	(On Diff #211602)	IIRC the costs should be ordered. If so, please use check next for entries with differing costs. (here and at other places)

Run clang-format, address comments from @fhahn.

In D63459#1605923, @fhahn wrote:

Can you re-run clang-format on the latest version of the patch? I think it would be good to get an in-tree user of this soon, to make sure the modeling works as expected on real hardware/benchmarks. Do you have a timeline to get this used? I think you mentioned LoopFusion as one of the first planned users?

I reformatted the patch and addressed inline comments. The paper reports that the analysis was used to guide both loop fusion and loop interchange. I used it myself profitably on loop interchange (not yet in tree), and the lit tests provided are based in part on that experiment.

If there are not further comments and/or concerns I'll commit the patch on Thursday.

In D63459#1607143, @etiotto wrote:

In D63459#1605923, @fhahn wrote:

Can you re-run clang-format on the latest version of the patch? I think it would be good to get an in-tree user of this soon, to make sure the modeling works as expected on real hardware/benchmarks. Do you have a timeline to get this used? I think you mentioned LoopFusion as one of the first planned users?

I reformatted the patch and addressed inline comments. The paper reports that the analysis was used to guide both loop fusion and loop interchange. I used it myself profitably on loop interchange (not yet in tree), and the lit tests provided are based in part on that experiment.

Right, are you planning on submitting a patch for loop interchange upstream?

I used the printer on some of the loop interchange tests, but the delinearization does not support some cases there yet. I hope I find some time to look into that next week and maybe also integrating it into loop interchange unless you plan to do so.

If there are not further comments and/or concerns I'll commit the patch on Thursday.

Right, are you planning on submitting a patch for loop interchange upstream?

I used the printer on some of the loop interchange tests, but the delinearization does not support some cases there yet. I hope I find some time to look into that next week and maybe also integrating it into loop interchange unless you plan to do so.

If you have time to integrate the analysis in loop interchange next week go ahead. I will need to work through other work in progress I am doing, and upstream those pieces before I can get to loop interchange. I think we can discuss details during the biweekly loop meetings.

In D63459#1607411, @fhahn wrote:

In D63459#1607143, @etiotto wrote:

In D63459#1605923, @fhahn wrote:

Can you re-run clang-format on the latest version of the patch? I think it would be good to get an in-tree user of this soon, to make sure the modeling works as expected on real hardware/benchmarks. Do you have a timeline to get this used? I think you mentioned LoopFusion as one of the first planned users?

I reformatted the patch and addressed inline comments. The paper reports that the analysis was used to guide both loop fusion and loop interchange. I used it myself profitably on loop interchange (not yet in tree), and the lit tests provided are based in part on that experiment.

Right, are you planning on submitting a patch for loop interchange upstream?

I used the printer on some of the loop interchange tests, but the delinearization does not support some cases there yet. I hope I find some time to look into that next week and maybe also integrating it into loop interchange unless you plan to do so.

If there are not further comments and/or concerns I'll commit the patch on Thursday.

greened added inline comments.Jul 31 2019, 12:49 PM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
45	So how will clients with different needs use this analysis? Some might want the definition in the paper but others might want to restrict it to exact memory locations only. A `cl::opt` doesn't allow configuration by clients.

Added a command line option to specify the threshold for temporal reuse.

etiotto marked 2 inline comments as done.Aug 6 2019, 11:59 AM

etiotto added inline comments.

llvm/lib/Analysis/LoopCacheAnalysis.cpp
45	I added a optional parameter to allow users to specify the temporal reuse threshold.

etiotto marked an inline comment as done.Aug 6 2019, 12:00 PM

etiotto updated this revision to Diff 214231.Aug 8 2019, 2:19 PM

Closed by commit rL368439: Title: Loop Cache Analysis (authored by whitneyt). · Explain WhyAug 9 2019, 6:56 AM

This revision was automatically updated to reflect the committed changes.

etiotto updated this revision to Diff 214384.Aug 9 2019, 8:54 AM

/home/xbolva00/LLVM/llvm/lib/Analysis/LoopCacheAnalysis.cpp:110:14: warning: ‘llvm::raw_ostream& llvm::operator<<(llvm::raw_ostream&, const llvm::IndexedReference&)’ has not been declared within llvm
raw_ostream &llvm::operator<<(raw_ostream &OS, const IndexedReference &R) {

^~~~

In file included from /home/xbolva00/LLVM/llvm/lib/Analysis/LoopCacheAnalysis.cpp:28:0:
/home/xbolva00/LLVM/llvm/include/llvm/Analysis/LoopCacheAnalysis.h:45:23: note: only here as a friend

friend raw_ostream &operator<<(raw_ostream &OS, const IndexedReference &R);
                    ^~~~~~~~

/home/xbolva00/LLVM/llvm/lib/Analysis/LoopCacheAnalysis.cpp:443:14: warning: ‘llvm::raw_ostream& llvm::operator<<(llvm::raw_ostream&, const llvm::CacheCost&)’ has not been declared within llvm
raw_ostream &llvm::operator<<(raw_ostream &OS, const CacheCost &CC) {

^~~~

In file included from /home/xbolva00/LLVM/llvm/lib/Analysis/LoopCacheAnalysis.cpp:28:0:
/home/xbolva00/LLVM/llvm/include/llvm/Analysis/LoopCacheAnalysis.h:174:23: note: only here as a friend

friend raw_ostream &operator<<(raw_ostream &OS, const CacheCost &CC);

GCC 7 Linux.

Fix build warning for operator<< when using GCC 7.

In D63459#1625843, @xbolva00 wrote:
/home/xbolva00/LLVM/llvm/lib/Analysis/LoopCacheAnalysis.cpp:110:14: warning: ‘llvm::raw_ostream& llvm::operator<<(llvm::raw_ostream&, const llvm::IndexedReference&)’ has not been declared within llvm
raw_ostream &llvm::operator<<(raw_ostream &OS, const IndexedReference &R) {
^~~~
In file included from /home/xbolva00/LLVM/llvm/lib/Analysis/LoopCacheAnalysis.cpp:28:0:
/home/xbolva00/LLVM/llvm/include/llvm/Analysis/LoopCacheAnalysis.h:45:23: note: only here as a friend
friend raw_ostream &operator<<(raw_ostream &OS, const IndexedReference &R);
                    ^~~~~~~~
/home/xbolva00/LLVM/llvm/lib/Analysis/LoopCacheAnalysis.cpp:443:14: warning: ‘llvm::raw_ostream& llvm::operator<<(llvm::raw_ostream&, const llvm::CacheCost&)’ has not been declared within llvm
raw_ostream &llvm::operator<<(raw_ostream &OS, const CacheCost &CC) {
^~~~
In file included from /home/xbolva00/LLVM/llvm/lib/Analysis/LoopCacheAnalysis.cpp:28:0:
/home/xbolva00/LLVM/llvm/include/llvm/Analysis/LoopCacheAnalysis.h:174:23: note: only here as a friend
friend raw_ostream &operator<<(raw_ostream &OS, const CacheCost &CC);
GCC 7 Linux.

I uploaded a fix for this warning. I use clang to build sigh, apologies for the problem.

No problem, it was just warning :)

In D63459#1626024, @etiotto wrote:
In D63459#1625843, @xbolva00 wrote:
/home/xbolva00/LLVM/llvm/lib/Analysis/LoopCacheAnalysis.cpp:110:14: warning: ‘llvm::raw_ostream& llvm::operator<<(llvm::raw_ostream&, const llvm::IndexedReference&)’ has not been declared within llvm
raw_ostream &llvm::operator<<(raw_ostream &OS, const IndexedReference &R) {
^~~~
In file included from /home/xbolva00/LLVM/llvm/lib/Analysis/LoopCacheAnalysis.cpp:28:0:
/home/xbolva00/LLVM/llvm/include/llvm/Analysis/LoopCacheAnalysis.h:45:23: note: only here as a friend
friend raw_ostream &operator<<(raw_ostream &OS, const IndexedReference &R);
                    ^~~~~~~~
/home/xbolva00/LLVM/llvm/lib/Analysis/LoopCacheAnalysis.cpp:443:14: warning: ‘llvm::raw_ostream& llvm::operator<<(llvm::raw_ostream&, const llvm::CacheCost&)’ has not been declared within llvm
raw_ostream &llvm::operator<<(raw_ostream &OS, const CacheCost &CC) {
^~~~
In file included from /home/xbolva00/LLVM/llvm/lib/Analysis/LoopCacheAnalysis.cpp:28:0:
/home/xbolva00/LLVM/llvm/include/llvm/Analysis/LoopCacheAnalysis.h:174:23: note: only here as a friend
friend raw_ostream &operator<<(raw_ostream &OS, const CacheCost &CC);
GCC 7 Linux.
I uploaded a fix for this warning. I use clang to build sigh, apologies for the problem.

If the patch has not been reverted, I think it would be better to create a new patch on Phabricator with the fix, so it is easier to see the change.

greened added inline comments.Aug 13 2019, 10:23 AM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
245	Can we add the value of `MaxDistance` to this message?

Oops, I missed that this landed already. Perhaps a later commit can improve the debug message.

Meinersbur mentioned this in D68789: [LoopNest]: Analysis to discover properties of a loop nest..Oct 28 2019, 11:43 AM

/home/xbolva00/LLVM/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp 353 warn V612 An unconditional 'return' within a loop.
/home/xbolva00/LLVM/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp 456 err V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator.

Found by PVS Studio

I think it would be good to get an in-tree user of this soon, to make sure the modeling works as expected on real hardware/benchmarks. Do you have a timeline to get this used? I think you mentioned LoopFusion as one of the first planned users?

What about this? Or still dead code?

In D63459#1731015, @xbolva00 wrote:

/home/xbolva00/LLVM/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp 353 warn V612 An unconditional 'return' within a loop.

This indeed does not look right. There are beaks and returns in the loop, but no continue. Meaning the loop is only iterated for one element (in this case: the innermost loop). I think nothing else than the innermost loop is needed for delinearization, in which case it could be made a conditional instead of a for-statement.

/home/xbolva00/LLVM/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp 456 err V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator.

TRT(TRT == None ? Optional<unsigned>(TemporalReuseThreshold) : TRT),

Could add parens around TRT == None to silence the analyzer.

In D63459#1731032, @Meinersbur wrote:
In D63459#1731015, @xbolva00 wrote:

/home/xbolva00/LLVM/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp 353 warn V612 An unconditional 'return' within a loop.

This indeed does not look right. There are beaks and returns in the loop, but no continue. Meaning the loop is only iterated for one element (in this case: the innermost loop). I think nothing else than the innermost loop is needed for delinearization, in which case it could be made a conditional instead of a for-statement.

/home/xbolva00/LLVM/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp 456 err V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator.
TRT(TRT == None ? Optional<unsigned>(TemporalReuseThreshold) : TRT),
Could add parens around TRT == None to silence the analyzer.

I have submitted a patch for review in https://reviews.llvm.org/D69821.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

LoopCacheAnalysis.h

280 lines

lib/

Analysis/

CMakeLists.txt

1 line

LoopCacheAnalysis.cpp

626 lines

Passes/

PassBuilder.cpp

1 line

PassRegistry.def

2 lines

test/

Analysis/

LoopCacheAnalysis/

88 lines

81 lines

185 lines

77 lines

98 lines

Diff 205434

llvm/include/llvm/Analysis/LoopCacheAnalysis.h

This file was added.

				//===- llvm/Analysis/LoopCacheAnalysis.h ------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file defines the interface for the loop cache analysis.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_ANALYSIS_LOOPCACHEANALYSIS_H
				#define LLVM_ANALYSIS_LOOPCACHEANALYSIS_H

				#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/Analysis/DependenceAnalysis.h"
				#include "llvm/Analysis/LoopAnalysisManager.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/ScalarEvolution.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/raw_ostream.h"

				namespace llvm {

				class LPMUpdater;
				using CacheCostTy = unsigned long long;
				using LoopVectorTy = SmallVector<Loop *, 8>;

				MeinersburUnsubmitted Done Reply Inline Actions [style] The current code base most often uses a `Ty` suffix for typedefs. Meinersbur: [style] The current code base most often uses a `Ty` suffix for typedefs.
				etiottoAuthorUnsubmitted Done Reply Inline Actions OK I'll use the Ty suffix. etiotto: OK I'll use the Ty suffix.
				/// Represents a memory reference as a base pointer and a set of indexing
				/// operations. For example given the array reference A[i][2j+1][3k+2] in a
				/// 3-dim loop nest:
				/// for(i=0;i<n;++i)
				/// for(j=0;j<m;++j)
				/// for(k=0;k<o;++k)
				/// ... A[i][2j+1][3k+2] ...
				/// We expect:
				/// BasePointer -> A
				/// Subscripts -> [{0,+,1}<%for.i>][{1,+,2}<%for.j>][{2,+,3}<%for.k>]
				/// Sizes -> [m][o][4]
				class IndexedReference {
				friend raw_ostream &operator<<(raw_ostream &OS, const IndexedReference &R);

				public:
				/// Construct an indexed reference given a \p StoreOrLoadInst instruction.
				IndexedReference(Instruction &StoreOrLoadInst, const LoopInfo &LI,
				ScalarEvolution &SE);

				bool isValid() const { return IsValid; }
				const SCEV *getBasePointer() const { return BasePointer; }
				unsigned getNumSubscripts() const { return Subscripts.size(); }
				const SCEV *getSubscript(unsigned SubNum) const {
				MeinersburUnsubmitted Done Reply Inline Actions Should return `size_t` Meinersbur: Should return `size_t`
				assert(SubNum < getNumSubscripts() && "Invalid subscript number");
				return Subscripts[SubNum];
				}
				const SCEV *getFirstSubscript() const {
				assert(!Subscripts.empty() && "Expecting non-empty container");
				return Subscripts.front();
				}
				const SCEV *getLastSubscript() const {
				assert(!Subscripts.empty() && "Expecting non-empty container");
				return Subscripts.back();
				}

				/// Return true if the current object and the indexed reference \p Other are
				/// in the same cache line of size \p CLS. This is true iff the distance
				/// between the 2 indexed references in the innermost dimension is less than
				/// the cache line size.
				bool hasSpacialReuse(const IndexedReference &Other, unsigned CLS,
				AliasAnalysis &AA) const;

				/// Return true if the current object and the indexed reference \p Other
				/// have distance smaller than 'TemporalReuseThreashold' in the dimension
				/// associated with the given loop \p L.
				greenedUnsubmitted Done Reply Inline Actions Referencing a `cl:opt` defined in the `.cpp`? It's a little confusing. greened: Referencing a `cl:opt` defined in the `.cpp`? It's a little confusing.
				etiottoAuthorUnsubmitted Done Reply Inline Actions I'll pass it as an argument. etiotto: I'll pass it as an argument.
				bool hasTemporalReuse(const IndexedReference &Other, const Loop &L,
				DependenceInfo &DI, AliasAnalysis &AA) const;

				/// Compute the cost of the reference w.r.t. the given loop \p L when it is
				/// considered in the innermost position in the loop nest.
				/// The cost is defined as:
				/// - equal to one if the reference is loop invariant, or
				/// - equal to '(TripCount * stride) / cache_line_size' if:
				/// + the reference stride is less than the cache line size, and
				/// + the coefficient of this loop's index variable used in all other
				/// subscripts is zero
				/// - or otherwise equal to 'TripCount'.
				CacheCostTy computeRefCost(const Loop &L, unsigned CLS) const;

				private:
				/// Attempt to delinearize the indexed reference.
				bool delinearize(const LoopInfo &LI);

				/// Return true if the index reference is invariant with respect to loop \p L.
				bool isLoopInvariant(const Loop &L) const;

				/// Return true if the indexed reference is 'consecutive' in loop \p L.
				/// An indexed reference is 'consecutive' if the only coefficient that uses
				/// the loop induction variable is the rightmost one, and the access stride is
				/// smaller than the cache line size \p CLS.
				bool isConsecutive(const Loop &L, unsigned CLS) const;

				/// Compute the trip count for the given loop \p L. Return the SCEV expression
				/// for the trip count or nullptr if it cannot be computed.
				const SCEV *computeTripCount(const Loop &L) const;

				greenedUnsubmitted Done Reply Inline Actions Why is this part of `IndexedReference`? greened: Why is this part of `IndexedReference`?
				etiottoAuthorUnsubmitted Done Reply Inline Actions I changed into a static function in the .cpp file instead. etiotto: I changed into a static function in the .cpp file instead.
				/// Return the coefficient used in the rightmost dimension.
				const SCEV *getLastCoefficient() const;

				/// Return true if the coefficient corresponding to induction variable of
				/// loop \p L in the given \p Subscript is zero or is loop invariant in \p L.
				bool isCoeffForLoopZeroOrInvariant(const SCEV &Subscript,
				const Loop &L) const;

				/// Verify that the given \p Subscript is 'well formed' (must be a simple add
				/// recurrence).
				bool isSimpleAddRecurrence(const SCEV &Subscript, const Loop &L) const;

				/// Return true if the given reference \p Other is definetely aliased with
				/// the indexed reference represented by this class.
				bool isAliased(const IndexedReference &Other, AliasAnalysis &AA) const;

				private:
				/// True if the reference can be delinearized, false otherwise.
				bool IsValid = false;

				/// Represent the memory reference instruction.
				Instruction &StoreOrLoadInst;

				/// The base pointer of the memory reference.
				const SCEV *BasePointer = nullptr;

				/// The subscript (indexes) of the memory reference.
				SmallVector<const SCEV *, 3> Subscripts;

				/// The dimensions of the memory reference.
				SmallVector<const SCEV *, 3> Sizes;

				ScalarEvolution &SE;
				};

				/// A reference group represents a set of memory references that exhibit
				/// temporal or spacial reuse. Two references belong to the same
				/// reference group with respect to a inner loop L iff:
				/// 1. they have a loop independent dependency, or
				/// 2. they have a loop carried dependence with a small dependence distance (e.g.
				/// less than 2) carried by the inner loop, or
				/// 3. they refer to the same array, and the subscript in their innermost
				/// dimension is less than or equal to 'd' (where 'd' is less than the cache
				/// line size)
				///
				/// Intuitively a reference group represents memory references that access
				/// the same cache line. Conditions 1,2 above account for temporal reuse, while
				/// contition 3 accounts for spacial reuse.
				using ReferenceGroupTy = SmallVector<std::unique_ptr<IndexedReference>, 8>;
				using ReferenceGroupsTy = SmallVector<std::unique_ptr<ReferenceGroupTy>, 8>;

				MeinersburUnsubmitted Done Reply Inline Actions It does not make sense to have owning pointers to SmallVector's. This just adds another level of indirection compared to containing `std::vector` (or `SmallVector` to have a single allocation for all elements if none exceeds the small size) directly. Meinersbur: It does not make sense to have owning pointers to SmallVector's. This just adds another level…
				/// \c CacheCost represents the estimated cost of a inner loop as the number of
				/// cache lines used by the memory references it contains.
				/// The 'cache cost' of a loop 'L' in a loop nest 'LN' is computed as the sum of
				/// the cache costs of all of its reference groups when the loop is considered
				/// to be in the innermost position in the nest.
				class CacheCost {
				friend raw_ostream &operator<<(raw_ostream &OS, const CacheCost &CC);
				using LoopTripCountTy = std::pair<const Loop *, unsigned>;
				using LoopCacheCostTy = std::pair<const Loop *, CacheCostTy>;

				public:
				static CacheCostTy constexpr InvalidCost = ULLONG_MAX;

				/// Construct a CacheCost object for the loop nest described by \p Loops.
				CacheCost(const LoopVectorTy &Loops, const LoopInfo &LI, ScalarEvolution &SE,
				TargetTransformInfo &TTI, AliasAnalysis &AA, DependenceInfo &DI);
				MeinersburUnsubmitted Done Reply Inline Actions [suggestion] Make these magic numbers `cl::opt` options? Meinersbur: [suggestion] Make these magic numbers `cl::opt` options?
				etiottoAuthorUnsubmitted Done Reply Inline Actions Added cl::opt for the default loop trip count and the temporal reuse threshold (see the .cpp file) etiotto: Added cl::opt for the default loop trip count and the temporal reuse threshold (see the .cpp…

				/// Calculate the cache footprint of each loop in the nest (when it is
				/// considered to be in the innermost position).
				void calculateCacheFootprint();

				/// Return the estimated cost of loop \p L if the given loop is part of the
				/// loop nest associated with this object. Return -1 otherwise.
				MeinersburUnsubmitted Done Reply Inline Actions `CacheCostTy` is `int64_t`; did you mean to assign `-1`? Meinersbur: `CacheCostTy` is `int64_t`; did you mean to assign `-1`?
				CacheCostTy getLoopCost(const Loop &L) const {
				auto IT = std::find_if(
				LoopCosts.begin(), LoopCosts.end(),
				[&L](const LoopCacheCostTy &LCC) { return LCC.first == &L; });
				return (IT != LoopCosts.end()) ? (*IT).second : -1;
				}

				const SmallVector<LoopCacheCostTy, 3> &getLoopCosts() const {
				return LoopCosts;
				}

				private:
				/// Partition store/load instructions in the loop nest into reference groups.
				/// Two or more memory accesses belong in the same reference group if they
				/// share the same cache line.
				bool populateReferenceGroups(ReferenceGroupsTy &RefGroups) const;

				fhahnUnsubmitted Done Reply Inline Actions IIRC this returns the ordered loop costs, right? Please document. fhahn: IIRC this returns the ordered loop costs, right? Please document.
				/// Calculate the cost of the given loop \p L assuming it is the innermost
				/// loop in nest.
				MeinersburUnsubmitted Done Reply Inline Actions To not leak the internal list implementation, return `ArrayRef<LoopCacheCostTy>`. Meinersbur: To not leak the internal list implementation, return `ArrayRef<LoopCacheCostTy>`.
				CacheCostTy computeLoopCacheCost(const Loop &L,
				const ReferenceGroupsTy &RefGroups) const;

				/// Compute the cost of a representative reference in reference group \p RG
				/// when the given loop \p L is considered as the innermost loop in the nest.
				/// The computed cost is an estimate for the number of cache lines used by the
				/// reference group. The representative reference cost is defined as:
				/// - equal to one if the reference is loop invariant, or
				/// - equal to '(TripCount * stride) / cache_line_size' if (a) loop \p L's
				/// induction variable is used only in the reference subscript associated
				/// with loop \p L, and (b) the reference stride is less than the cache
				/// line size, or
				/// - TripCount otherwise
				CacheCostTy computeRefGroupCacheCost(const ReferenceGroupTy &RG,
				const Loop &L) const;

				/// Sort the LoopCosts vector by decreasing cache cost.
				void sortLoopCosts() {
				std::sort(LoopCosts.begin(), LoopCosts.end(),
				xbolva00Unsubmitted Done Reply Inline Actions llvm::sort xbolva00: llvm::sort
				etiottoAuthorUnsubmitted Done Reply Inline Actions Yup agree, thanks for pointing this out. etiotto: Yup agree, thanks for pointing this out.
				fhahnUnsubmitted Done Reply Inline Actions Do you anticipate most users relying on the loops to be ordered by cost? If not, it might be worth switching to a map to avoid using `find_if` at multiple places (even though not too common in user-written C/C++ code, I've encountered various (auto-generated) sources, which have very deep nesting levels of loops). LoopInterhcange for example would be interested in comparing just the total cache cost of a various re-orderings of a loop nest. fhahn: Do you anticipate most users relying on the loops to be ordered by cost? If not, it might be…
				etiottoAuthorUnsubmitted Done Reply Inline Actions I was envisioning Loop Interchange wanting to get a sorted vector of cache costs for a nest and then using it to determine the optimal reordering of the loop in the nest (loop with smaller cost in the innermost position, larger cost in the outermost position). Having the cache cost vector sorted is also handy when printing it. etiotto: I was envisioning Loop Interchange wanting to get a sorted vector of cache costs for a nest and…
				[](const LoopCacheCostTy &A, const LoopCacheCostTy &B) {
				return A.second > B.second;
				});
				}

				private:
				/// Loops in the loop nest associated with this object.
				LoopVectorTy Loops;

				/// Trip counts for the loops in the loop nest associated with this object.
				SmallVector<LoopTripCountTy, 3> TripCounts;

				/// Cache costs for the loops in the loop nest associated with this object.
				SmallVector<LoopCacheCostTy, 3> LoopCosts;

				const LoopInfo &LI;
				ScalarEvolution &SE;
				TargetTransformInfo &TTI;
				AliasAnalysis &AA;
				DependenceInfo &DI;
				};

				/// Estimate the number of cache lines used by the memory references in a loop.
				/// This analysis starts by classifing memory references in a loop into
				/// reference groups. A reference group represents memory references that fall
				/// into the same cache line. Each reference group is analysed with respect to
				/// the innermost loop in a loop nest. The cost of a reference is defined as
				/// follow:
				/// - one if it is loop invariant w.r.t the innermost loop,
				/// - equal to the loop trip count divided by the cache line times the
				/// reference stride if the reference stride is less than the cache line
				/// size (CLS), and the coefficient of this loop's index variable used in all
				/// other subscripts is zero (e.g. RefCost = TripCount/(CLS/RefStride))
				/// - equal to the innermost loop trip count if the reference stride is greater
				/// or equal to the cache line size CLS.
				class LoopCacheAnalysis : public AnalysisInfoMixin<LoopCacheAnalysis> {
				friend AnalysisInfoMixin<LoopCacheAnalysis>;
				static AnalysisKey Key;

				public:
				LoopCacheAnalysis() = default;

				using Result = std::unique_ptr<CacheCost>;
				Result run(Loop &L, LoopAnalysisManager &AM, LoopStandardAnalysisResults &AR);
				};

				/// Printer pass for the \c CacheCost results.
				class LoopCachePrinterPass : public PassInfoMixin<LoopCachePrinterPass> {
				raw_ostream &OS;

				public:
				explicit LoopCachePrinterPass(raw_ostream &OS) : OS(OS) {}

				PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
				LoopStandardAnalysisResults &AR, LPMUpdater &U);
				};

				} // namespace llvm

				#endif // LLVM_ANALYSIS_LOOPCACHEANALYSIS_H

llvm/lib/Analysis/CMakeLists.txt

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMAnalysis
LazyBlockFrequencyInfo.cpp		LazyBlockFrequencyInfo.cpp
LazyCallGraph.cpp		LazyCallGraph.cpp
LazyValueInfo.cpp		LazyValueInfo.cpp
LegacyDivergenceAnalysis.cpp		LegacyDivergenceAnalysis.cpp
Lint.cpp		Lint.cpp
Loads.cpp		Loads.cpp
LoopAccessAnalysis.cpp		LoopAccessAnalysis.cpp
LoopAnalysisManager.cpp		LoopAnalysisManager.cpp
		LoopCacheAnalysis.cpp
LoopUnrollAnalyzer.cpp		LoopUnrollAnalyzer.cpp
LoopInfo.cpp		LoopInfo.cpp
LoopPass.cpp		LoopPass.cpp
MemDepPrinter.cpp		MemDepPrinter.cpp
MemDerefPrinter.cpp		MemDerefPrinter.cpp
MemoryBuiltins.cpp		MemoryBuiltins.cpp
MemoryDependenceAnalysis.cpp		MemoryDependenceAnalysis.cpp
MemoryLocation.cpp		MemoryLocation.cpp
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/lib/Analysis/LoopCacheAnalysis.cpp

This file was added.

				//===- LoopCacheAnalysis.cpp - Loop Cache Analysis -------------------------==//
				//
				// The LLVM Compiler Infrastructure
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file defines the implementation for the loop cache analysis.
				/// The implementation is largely based on the following paper:
				///
				/// Compiler Optimizations for Improving Data Locality
				/// By: Steve Carr, Katherine S. McKinley, Chau-Wen Tseng
				/// http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf
				///
				/// The general approach taken to estimate the number of cache lines used by the
				/// memory references in an inner loop is:
				/// 1. Partition memory references that exhibit temporal or spacial reuse
				/// into reference groups.
				/// 2. For each loop L in the a loop nest LN:
				/// a. Compute the cost of the reference group
				/// b. Compute the loop cost by summing up the reference groups costs
				//===----------------------------------------------------------------------===//

				#include "llvm/Analysis/LoopCacheAnalysis.h"
				#include "llvm/ADT/Sequence.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Support/Debug.h"
				#include <deque>

				using namespace llvm;

				#define DEBUG_TYPE "loop-cache-cost"

				static cl::opt<unsigned> DefaultTripCount(
				"default-trip-count", cl::init(100), cl::Hidden,
				cl::desc("Use this to specify the default trip count of a loop"));

				static cl::opt<unsigned> TemporalReuseThreshold(
				"temporal-reuse-threshold", cl::init(2), cl::Hidden,
				cl::desc("Use this to specify the temporal reuse distance threshold"));

				greenedUnsubmitted Done Reply Inline Actions What units is this in? What does "2" mean? greened: What units is this in? What does "2" mean?
				etiottoAuthorUnsubmitted Done Reply Inline Actions I clarified the description. etiotto: I clarified the description.
				greenedUnsubmitted Done Reply Inline Actions I'm still a bit confused. Does this mean the `a[i]` and `a[i+1]` will be considered to have temporal reuse with a value of `2`? Perhaps "temporal reuse" itself needs a definition. Are we talking about referencing the same exact memory location, the same cache line, the same page, ...? It seems odd to me that "temporal reuse" would mean anything other than accessing exactly the same memory location. Everything else I would consider to be "spacial reuse." At the very least this deserves a longer comment block about what it means and its implications. Some clients want the definition of "temporal reuse" to be "access the exact same memory location" and this default value seems to mean something very different. greened: I'm still a bit confused. Does this mean the `a[i]` and `a[i+1]` will be considered to have…
				etiottoAuthorUnsubmitted Done Reply Inline Actions That's correct, a[i] and a[i+1] are consider to have temporal reuse when the threshold is 2. A threshold of 1 would cause only references to the same memory location have temporal reuse. The analysis attempts to implement the algorithm in the paper mentioned in the summary, and I agree that this is a bit confusing. I will add a comment to attempt to clarify better. etiotto: That's correct, a[i] and a[i+1] are consider to have temporal reuse when the threshold is 2. A…
				greenedUnsubmitted Done Reply Inline Actions So how will clients with different needs use this analysis? Some might want the definition in the paper but others might want to restrict it to exact memory locations only. A `cl::opt` doesn't allow configuration by clients. greened: So how will clients with different needs use this analysis? Some might want the definition in…
				etiottoAuthorUnsubmitted Done Reply Inline Actions I added a optional parameter to allow users to specify the temporal reuse threshold. etiotto: I added a optional parameter to allow users to specify the temporal reuse threshold.
				/// Populate \p Loops with the loops in the loop nest rooted by loop \p Root.
				static void getLoops(const Loop &Root, LoopVectorTy &Loops) {
				assert(Root.getParentLoop() == nullptr && "Root should be a top level loop");
				etiottoAuthorUnsubmitted Done Reply Inline Actions Will remove this static function and instead use the breadth_first(&Root) iterator from ADT/BreadthFirstIterator.h etiotto: Will remove this static function and instead use the breadth_first(&Root) iterator from…
				assert(Loops.empty() && "Expecting Loops to be empty");

				Loop CurrentLoop = const_cast<Loop >(&Root);
				Loops.push_back(CurrentLoop);
				MeinersburUnsubmitted Done Reply Inline Actions Before `const_cast` why not removing the `const` from the parameter list? Meinersbur: Before `const_cast` why not removing the `const` from the parameter list?
				etiottoAuthorUnsubmitted Done Reply Inline Actions I dropped the const qualifier in the parameter declaration and adjusted the code accordingly. etiotto: I dropped the const qualifier in the parameter declaration and adjusted the code accordingly.

				// Collect the loops in the incoming nest in breadth-first order.
				std::deque<Loop*> WL;
				WL.push_back(CurrentLoop);
				MeinersburUnsubmitted Done Reply Inline Actions `WL` is used as a stack, a (Small)Vector could do this as well. Meinersbur: `WL` is used as a stack, a (Small)Vector could do this as well.

				while (!WL.empty()) {
				CurrentLoop = WL.back();
				WL.pop_back();
				const auto *SubLoops = &CurrentLoop->getSubLoops();
				if (SubLoops->empty())
				continue;

				Loops.insert(Loops.end(), SubLoops->begin(), SubLoops->end());
				MeinersburUnsubmitted Done Reply Inline Actions Is passing a single innermost loop a precondition or a what this function computes? It seems the only loop that this function might return is `Loops.back()` and kind-of verifies that the loop vector is in the right order?!? Meinersbur: Is passing a single innermost loop a precondition or a what this function computes? It seems…
				etiottoAuthorUnsubmitted Done Reply Inline Actions This function attempts to retrieve the innermost loop in the given loop vector. It returns a nullptr if any loops in the loop vector supplied has more than one sibling. The loop vector is expected to contain loops collected in breadth-first order. I'll improve the comment. etiotto: This function attempts to retrieve the innermost loop in the given loop vector. It returns a…
				WL.insert(WL.begin(), SubLoops->rbegin(), SubLoops->rend());
				}
				}

				/// Retrieve the innermost loop in the given loop nest \p Loops. It returns a
				/// nullptr if any loops in the loop vector supplied has more than one sibling.
				/// The loop vector is expected to contain loops collected in breadth-first
				/// order.
				static const Loop *getInnerMostLoop(const LoopVectorTy &Loops) {
				assert(!Loops.empty() && "Expecting a non-empy loop vector");

				const Loop *LastLoop = Loops.back();
				const Loop *ParentLoop = LastLoop->getParentLoop();

				if (ParentLoop == nullptr) {
				assert(Loops.size() == 1 && "Expecting a single loop");
				return LastLoop;
				}

				return (std::is_sorted(Loops.begin(), Loops.end(),
				[](const Loop* L1, const Loop *L2) {
				return L1->getLoopDepth() < L2->getLoopDepth();
				})) ? LastLoop : nullptr;
				}

				//===----------------------------------------------------------------------===//
				// IndexedReference implementation
				//
				raw_ostream &llvm::operator<<(raw_ostream &OS, const IndexedReference &R) {
				if (!R.IsValid) {
				OS << R.StoreOrLoadInst;
				OS << ", IsValid=false.";
				return OS;
				}

				OS << *R.BasePointer;
				for (auto *Subscript : R.Subscripts)
				OS << "[" << *Subscript << "]";
				MeinersburUnsubmitted Done Reply Inline Actions Don’t “almost always” use auto for (const SCEV Subscript : R.subscripts) You seem to have applied the coding standard for auto everywhere except in foreach-loops. Meinersbur:* [[ https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more…
				etiottoAuthorUnsubmitted Done Reply Inline Actions OK I'll fix the use of auto in foreach-loops. etiotto: OK I'll fix the use of auto in foreach-loops.

				OS << ", Sizes: ";
				for (auto *Size : R.Sizes)
				OS << "[" << *Size << "]";

				return OS;
				}

				IndexedReference::IndexedReference(Instruction &StoreOrLoadInst,
				const LoopInfo &LI,
				ScalarEvolution &SE)
				: IsValid(false), StoreOrLoadInst(StoreOrLoadInst), BasePointer(nullptr),
				Subscripts(), Sizes(), SE(SE) {
				assert((isa<StoreInst>(StoreOrLoadInst) \|\| isa<LoadInst>(StoreOrLoadInst)) &&
				MeinersburUnsubmitted Done Reply Inline Actions No need for `Subscripts(), Sizes()` Prefer default member initializer at declaration over `IsValid(false), BaserPointer(nullptr)` Meinersbur: No need for `Subscripts(), Sizes()` Prefer default member initializer at declaration over…
				"Expecting a load or store instruction");

				IsValid = delinearize(LI);
				if (IsValid)
				MeinersburUnsubmitted Done Reply Inline Actions [subjective] I'd prefer a function that either returns an `nullptr`/`ErrorOr<IndexedReference>` than an `IsValid` flag for ever object. Meinersbur: [subjective] I'd prefer a function that either returns an `nullptr`/`ErrorOr<IndexedReference>`…
				LLVM_DEBUG(dbgs().indent(2)
				<< "Succesfully delinearized: " << *this << "\n");
				}

				bool IndexedReference::hasSpacialReuse(const IndexedReference &Other,
				unsigned CLS, AliasAnalysis &AA) const {
				assert(IsValid && "Expecting a valid reference");

				if (BasePointer != Other.getBasePointer() && !isAliased(Other, AA)) {
				LLVM_DEBUG(dbgs().indent(2)
				<< "No spacial reuse: different base pointer\n");
				return false;
				}

				unsigned NumSubscripts = getNumSubscripts();
				if (NumSubscripts != Other.getNumSubscripts()) {
				LLVM_DEBUG(
				dbgs().indent(2)
				<< "No spacial reuse: different number of subscripts\n");
				return false;
				}

				// all subscripts must be equal, except the leftmost one (the last one).
				for (auto SubNum : seq<unsigned>(0, NumSubscripts - 1)) {
				if (getSubscript(SubNum) != Other.getSubscript(SubNum)) {
				MeinersburUnsubmitted Done Reply Inline Actions Nice, this is where the coding standard recommends `auto`. Meinersbur: Nice, this is where the coding standard recommends `auto`.
				etiottoAuthorUnsubmitted Done Reply Inline Actions Thanks. etiotto: Thanks.
				LLVM_DEBUG(dbgs().indent(2)
				<< "No spacial reuse, different subscripts: "
				<< "\n\t" << *getSubscript(SubNum) << "\n\t"
				<< *Other.getSubscript(SubNum) << "\n");
				return false;
				}
				}

				// the difference between the last subscripts must be less than the cache line
				// size
				const SCEV *LastSubscript = getLastSubscript();
				const SCEV *OtherLastSubscript = Other.getLastSubscript();
				const SCEVConstant *Diff = dyn_cast<SCEVConstant>(
				SE.getMinusSCEV(LastSubscript, OtherLastSubscript));
				if (Diff == nullptr) {
				LLVM_DEBUG(
				dbgs().indent(2)
				<< "No spacial reuse, difference between subscript:\n\t"
				<< *LastSubscript << "\n\t" << OtherLastSubscript
				<< "\nis not constant.\n");
				return false;
				}
				MeinersburUnsubmitted Done Reply Inline Actions Don’t “almost always” use auto Meinersbur: [[ https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more…

				bool InSameCacheLine = (Diff->getValue()->getSExtValue() < CLS);

				if (InSameCacheLine)
				LLVM_DEBUG(dbgs().indent(2) << "Found spacial reuse.\n");
				else
				LLVM_DEBUG(dbgs().indent(2) << "No spacial reuse.\n");

				MeinersburUnsubmitted Done Reply Inline Actions Some compilers may complain about empty statements in non-debug builds. Alternative: LLVM_DEBUG({ if (InSameCacheLine) dbgs().indent(2) << "Found spacial reuse.\n"; else dbgs().indent(2) << "No spacial reuse.\n"; }); Meinersbur: Some compilers may complain about empty statements in non-debug builds. Alternative: ```…
				etiottoAuthorUnsubmitted Done Reply Inline Actions Agree. Is cleaner the way you propose as there is only one LLVM_DEBUG. etiotto: Agree. Is cleaner the way you propose as there is only one LLVM_DEBUG.
				return InSameCacheLine;
				}

				bool IndexedReference::hasTemporalReuse(const IndexedReference &Other,
				const Loop &L, DependenceInfo &DI,
				AliasAnalysis &AA) const {
				assert(IsValid && "Expecting a valid reference");

				if (BasePointer != Other.getBasePointer() && !isAliased(Other, AA)) {
				LLVM_DEBUG(dbgs().indent(2)
				<< "No temporal reuse: different base pointer\n");
				return false;
				}

				greenedUnsubmitted Done Reply Inline Actions This seems a bit dangerous to me. Depending on the client, we might want to assume reuse if we don't know the distance. Could this function return a tribool of (yes/no/unknown)? greened: This seems a bit dangerous to me. Depending on the client, we might want to assume reuse if we…
				etiottoAuthorUnsubmitted Done Reply Inline Actions If the dependence distance is unknown at compile time the references are conservatively considered to have no spacial reuse, and consequently the analysis will overestimate the number of cache lines used by the loop (when it is in the innermost position in the nest). For now I can return an Optional<bool> (didn't find a tribool data type readily available in LLVM). This will make it easier to place references with 'unknow' distance in the same reference group if we find a motivating test case that needs it. etiotto: If the dependence distance is unknown at compile time the references are conservatively…
				auto D = DI.depends(&StoreOrLoadInst, &Other.StoreOrLoadInst, true);
				if (D == nullptr) {
				LLVM_DEBUG(dbgs().indent(2) << "No temporal reuse: no dependence\n");
				return false;
				}

				if (D->isLoopIndependent()) {
				LLVM_DEBUG(dbgs().indent(2) << "Found temporal reuse\n");
				return true;
				}

				// Check the dependence distance at every loop level. There is temporal reuse
				// if the distance at the given loop's depth is small (\|d\| <= 2) and it is
				// zero at every other loop level.
				unsigned LoopDepth = L.getLoopDepth();
				unsigned Levels = D->getLevels();
				for (unsigned Level = 1; Level <= Levels; ++Level) {
				const SCEV *Distance = D->getDistance(Level);
				MeinersburUnsubmitted Done Reply Inline Actions Prefer `int` over `unsigned` Meinersbur: Prefer `int` over `unsigned`
				const SCEVConstant *SCEVConst = dyn_cast_or_null<SCEVConstant>(Distance);

				if (SCEVConst == nullptr) {
				LLVM_DEBUG(dbgs().indent(2) << "No temporal reuse: distance unknown\n");
				return false;
				}

				const ConstantInt &CI = *SCEVConst->getValue();
				if (Level != LoopDepth && !CI.isZero()) {
				LLVM_DEBUG(dbgs().indent(2)
				<< "No temporal reuse: distance is not zero at depth=" << Level
				<< "\n");
				return false;
				} else if (Level == LoopDepth &&
				CI.getSExtValue() > TemporalReuseThreshold) {
				LLVM_DEBUG(dbgs().indent(2)
				<< "No temporal reuse: distance is greater than 2 at depth="
				<< Level << "\n");
				return false;
				}
				}

				LLVM_DEBUG(dbgs().indent(2) << "Found temporal reuse\n");
				return true;
				}

				CacheCostTy IndexedReference::computeRefCost(const Loop &L,
				unsigned CLS) const {
				assert(IsValid && "Expecting a valid reference");
				LLVM_DEBUG(dbgs().indent(2) << "Computing cache cost for:\n";
				dbgs().indent(4) << *this << "\n");

				greenedUnsubmitted Done Reply Inline Actions The debug message should reference `MaxDistance` and not hard-code the value `2`. greened: The debug message should reference `MaxDistance` and not hard-code the value `2`.
				// If the indexed reference is loop invariant the cost is one.
				if (isLoopInvariant(L)) {
				LLVM_DEBUG(dbgs().indent(4) << "Reference is loop invariant: RefCost=1\n");
				return 1;
				}
				greenedUnsubmitted Not Done Reply Inline Actions Can we add the value of `MaxDistance` to this message? greened: Can we add the value of `MaxDistance` to this message?

				// If the indexed reference is 'consecutive' the cost is
				// (TripCount*Stride)/CLS, otherwise the cost is TripCount.
				const SCEV *TripCount = computeTripCount(L);
				const SCEV *RefCost = TripCount;

				LLVM_DEBUG(dbgs() << "TripCount=" << *TripCount << "\n");

				if (isConsecutive(L, CLS)) {
				const SCEV *Coeff = getLastCoefficient();
				const SCEV *ElemSize = Sizes.back();
				const SCEV *Stride = SE.getMulExpr(Coeff, ElemSize);
				fhahnUnsubmitted Done Reply Inline Actions RefCost is guaranteed to be a SCEVConstant here, so it would be better to use cast<> instead. Or even better if (auto ConstantCost = dyn_cast<SCEVCOnstant>(RefCost)) return return ConstantCost->getValue()->getSExtValue(); LLVM_DEBUG(dbgs().indent(4) << "RefCost is not a constant! Setting to RefCost=InvalidCost " "(invalid value).\n"); return CacheCost::InvalidCost; fhahn: RefCost is guaranteed to be a SCEVConstant here, so it would be better to use cast<> instead.
				const SCEV *CacheLineSize = SE.getConstant(Stride->getType(), CLS);
				const SCEV *Numerator = SE.getMulExpr(Stride, TripCount);
				RefCost = SE.getUDivExpr(Numerator, CacheLineSize);
				LLVM_DEBUG(dbgs().indent(4)
				<< "Access is consecutive: RefCost=(TripCount*Stride)/CLS="
				<< *RefCost << "\n");
				} else
				LLVM_DEBUG(dbgs().indent(4)
				<< "Access is not consecutive: RefCost=TripCount=" << *RefCost
				<< "\n");

				fhahnUnsubmitted Done Reply Inline Actions It looks like this function does not capture much and might be better as a separate member function? fhahn: It looks like this function does not capture much and might be better as a separate member…
				etiottoAuthorUnsubmitted Done Reply Inline Actions OK I'll make it a private member function. etiotto: OK I'll make it a private member function.
				// Attempt to fold RefCost into a constant.
				RefCost = dyn_cast<SCEVConstant>(RefCost);
				if (RefCost == nullptr) {
				LLVM_DEBUG(dbgs().indent(4)
				<< "RefCost is not a constant! Setting to RefCost=InvalidCost "
				"(invalid value).\n");
				return CacheCost::InvalidCost;
				fhahnUnsubmitted Done Reply Inline Actions nit: Capitalize start of sentence. fhahn: nit: Capitalize start of sentence.
				}

				return dyn_cast<SCEVConstant>(RefCost)->getValue()->getSExtValue();
				}

				bool IndexedReference::delinearize(const LoopInfo &LI) {
				fhahnUnsubmitted Done Reply Inline Actions nit: Capitalize start of sentence. fhahn: nit: Capitalize start of sentence.
				assert(Subscripts.empty() && "Subscripts should be empty");
				assert(Sizes.empty() && "Sizes should be empty");
				assert(!IsValid && "Should be called once from the constructor");
				LLVM_DEBUG(dbgs() << "Delinearizing: " << StoreOrLoadInst << "\n");

				const SCEV *ElemSize = SE.getElementSize(&StoreOrLoadInst);

				auto isOneDimensionalArray = [&](const SCEV AccessFn, const Loop L) {
				const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(AccessFn);
				if (AR == nullptr \|\| !AR->isAffine())
				return false;

				assert(AR->getLoop() && "AR should have a loop");

				// check that start and increment are not add recurrences.
				const SCEV *Start = AR->getStart();
				const SCEV *Step = AR->getStepRecurrence(SE);
				if (isa<SCEVAddRecExpr>(Start) \|\| isa<SCEVAddRecExpr>(Step))
				return false;

				// check that start and increment are both invariant in the loop.
				if (!SE.isLoopInvariant(Start, L) \|\| !SE.isLoopInvariant(Step, L))
				return false;

				return AR->getStepRecurrence(SE) == ElemSize;
				};

				MeinersburUnsubmitted Done Reply Inline Actions [nit] double space Meinersbur: [nit] double space
				const BasicBlock *BB = StoreOrLoadInst.getParent();
				for (Loop *L = LI.getLoopFor(BB); L != nullptr; L = L->getParentLoop()) {
				const SCEV *AccessFn =
				SE.getSCEVAtScope(getPointerOperand(&StoreOrLoadInst), L);

				BasePointer = dyn_cast<SCEVUnknown>(SE.getPointerBase(AccessFn));
				if (BasePointer == nullptr) {
				LLVM_DEBUG(
				dbgs().indent(2)
				<< "ERROR: failed to delinearize, can't identify base pointer\n");
				return false;
				}
				MeinersburUnsubmitted Done Reply Inline Actions [style] `Subscripts.empty()`. `Sizes.empty()` Meinersbur: [style] `Subscripts.empty()`. `Sizes.empty()`

				AccessFn = SE.getMinusSCEV(AccessFn, BasePointer);

				LLVM_DEBUG(dbgs().indent(2) << "In Loop '" << L->getName()
				<< "', AccessFn: " << *AccessFn << "\n");

				SE.delinearize(AccessFn, Subscripts, Sizes,
				SE.getElementSize(&StoreOrLoadInst));

				if (Subscripts.empty() \|\| Sizes.empty() \|\|
				Subscripts.size() != Sizes.size()) {
				// Attempt to determine whether we have a single dimensional array access.
				// before giving up.
				MeinersburUnsubmitted Done Reply Inline Actions [style] Did you mean break/return here? The next iteration might add elements to it. I'd structure this as if (!isOneDimensionalArray(AccessFn, L)) { ... break; } ... conforming https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code. Meinersbur: [style] Did you mean break/return here? The next iteration might add elements to it. I'd…
				if (!isOneDimensionalArray(AccessFn, L)) {
				LLVM_DEBUG(dbgs().indent(2)
				<< "ERROR: failed to delinearize reference\n");
				Subscripts.clear();
				MeinersburUnsubmitted Done Reply Inline Actions [style] `return all_of(Subcripts, ...` Meinersbur: [style] `return all_of(Subcripts, ...`
				Sizes.clear();
				break;
				}

				const SCEV *Div = SE.getUDivExactExpr(AccessFn, ElemSize);
				Subscripts.push_back(Div);
				Sizes.push_back(ElemSize);
				}

				return all_of(Subscripts, [&](const SCEV *Subscript) {
				return isSimpleAddRecurrence(Subscript, L);
				});
				}

				return false;
				}

				bool IndexedReference::isLoopInvariant(const Loop &L) const {
				Value *Addr = getPointerOperand(&StoreOrLoadInst);
				assert(Addr != nullptr && "Expecting either a load or a store instruction");
				assert(SE.isSCEVable(Addr->getType()) && "Addr should be SCEVable");

				if (SE.isLoopInvariant(SE.getSCEV(Addr), &L))
				return true;

				// The indexed reference is loop invariant if none of the coefficients use
				// the loop induction variable.
				bool allCoeffForLoopAreZero = std::all_of(
				xbolva00Unsubmitted Done Reply Inline Actions llvm::all_of xbolva00: llvm::all_of
				Subscripts.begin(), Subscripts.end(), [&](const SCEV *Subscript) {
				return isCoeffForLoopZeroOrInvariant(*Subscript, L);
				});

				return allCoeffForLoopAreZero;
				}

				bool IndexedReference::isConsecutive(const Loop &L, unsigned CLS) const {
				// The indexed reference is 'consecutive' if the only coefficient that uses
				// the loop induction variable is the last one...
				const SCEV *LastSubscript = Subscripts.back();
				for (const SCEV *Subscript : Subscripts) {
				if (Subscript == LastSubscript)
				continue;
				if (!isCoeffForLoopZeroOrInvariant(*Subscript, L))
				return false;
				}

				// ...and the access stride is less than the cache line size.
				const SCEV *Coeff = getLastCoefficient();
				const SCEV *ElemSize = Sizes.back();
				const SCEV *Stride = SE.getMulExpr(Coeff, ElemSize);
				MeinersburUnsubmitted Done Reply Inline Actions Why not using `getBackedgeTakenCount()`? Meinersbur: Why not using `getBackedgeTakenCount()`?
				etiottoAuthorUnsubmitted Done Reply Inline Actions I'll do that. etiotto: I'll do that.
				const SCEV *CacheLineSize = SE.getConstant(Stride->getType(), CLS);

				return SE.isKnownPredicate(ICmpInst::ICMP_ULT, Stride, CacheLineSize);
				}

				const SCEV *IndexedReference::computeTripCount(const Loop &L) const {
				const SCEV *BackedgeTakenCount = SE.getBackedgeTakenCount(&L);
				if (isa<SCEVCouldNotCompute>(BackedgeTakenCount) \|\|
				!isa<SCEVConstant>(BackedgeTakenCount)) {
				LLVM_DEBUG(dbgs() << "Trip count of loop " << L.getName()
				<< " could not be computed");
				const SCEV *ElemSize = Sizes.back();
				return SE.getConstant(ElemSize->getType(), DefaultTripCount);
				}

				return SE.getAddExpr(BackedgeTakenCount,
				SE.getOne(BackedgeTakenCount->getType()));
				}

				const SCEV *IndexedReference::getLastCoefficient() const {
				const SCEV *LastSubscript = getLastSubscript();
				assert(isa<SCEVAddRecExpr>(LastSubscript) &&
				"Expecting a SCEV add recurrence expression");
				const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(LastSubscript);
				return AR->getStepRecurrence(SE);
				}

				bool IndexedReference::isCoeffForLoopZeroOrInvariant(const SCEV &Subscript,
				const Loop &L) const {
				const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(&Subscript);
				return (AR != nullptr) ? AR->getLoop() != &L
				: SE.isLoopInvariant(&Subscript, &L);
				}

				bool IndexedReference::isSimpleAddRecurrence(const SCEV &Subscript,
				const Loop &L) const {
				if (!isa<SCEVAddRecExpr>(Subscript))
				return false;

				const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(&Subscript);
				assert(AR->getLoop() && "AR should have a loop");

				if (!AR->isAffine())
				return false;

				const SCEV *Start = AR->getStart();
				const SCEV *Step = AR->getStepRecurrence(SE);

				if (!SE.isLoopInvariant(Start, &L) \|\| !SE.isLoopInvariant(Step, &L))
				return false;

				return true;
				}

				bool IndexedReference::isAliased(const IndexedReference &Other,
				AliasAnalysis &AA) const {
				const auto &Loc1 = MemoryLocation::get(&StoreOrLoadInst);
				const auto &Loc2 = MemoryLocation::get(&Other.StoreOrLoadInst);
				return AA.isMustAlias(Loc1, Loc2);
				}

				//===----------------------------------------------------------------------===//
				// CacheCost implementation
				//
				raw_ostream &llvm::operator<<(raw_ostream &OS, const CacheCost &CC) {
				for (auto LC : CC.LoopCosts) {
				const Loop *L = LC.first;
				OS << "Loop '" << L->getName() << "' has cost = " << LC.second << "\n";
				}
				return OS;
				fhahnUnsubmitted Done Reply Inline Actions \n at the end? fhahn: \n at the end?
				}

				CacheCost::CacheCost(const LoopVectorTy &Loops, const LoopInfo &LI,
				ScalarEvolution &SE, TargetTransformInfo &TTI,
				AliasAnalysis &AA, DependenceInfo &DI)
				: Loops(Loops), TripCounts(), LoopCosts(), LI(LI), SE(SE), TTI(TTI), AA(AA),
				DI(DI) {
				assert(!Loops.empty() && "Expecting a non-empty loop vector.");

				for (const Loop *L : Loops) {
				unsigned TripCount = SE.getSmallConstantTripCount(L);
				TripCount = (TripCount == 0) ? DefaultTripCount : TripCount;
				TripCounts.push_back({L, TripCount});
				}

				fhahnUnsubmitted Done Reply Inline Actions This should be passed in I think and the users should request it via the pass manager. fhahn: This should be passed in I think and the users should request it via the pass manager.
				calculateCacheFootprint();
				}

				void CacheCost::calculateCacheFootprint() {
				LLVM_DEBUG(dbgs() << "POPULATING REFERENCE GROUPS\n");
				ReferenceGroupsTy RefGroups;
				if (!populateReferenceGroups(RefGroups))
				return;

				LLVM_DEBUG(dbgs() << "COMPUTING LOOP CACHE COSTS\n");
				for (const Loop *L : Loops) {
				auto IT =
				std::find_if(LoopCosts.begin(), LoopCosts.end(),
				[L](const LoopCacheCostTy &LCC) { return LCC.first == L; });
				assert(IT == LoopCosts.end() && "Should not add duplicate element");
				CacheCostTy LoopCost = computeLoopCacheCost(*L, RefGroups);
				LoopCosts.push_back(std::make_pair(L, LoopCost));
				}

				sortLoopCosts();
				RefGroups.clear();
				}

				bool CacheCost::populateReferenceGroups(ReferenceGroupsTy &RefGroups) const {
				assert(RefGroups.empty() && "Reference groups should be empty");

				const unsigned CLS = TTI.getCacheLineSize();
				const Loop *InnerMostLoop = getInnerMostLoop(Loops);
				MeinersburUnsubmitted Done Reply Inline Actions The LLVM code base does not use `const` for stack variables. Meinersbur: The LLVM code base does not use `const` for stack variables.
				etiottoAuthorUnsubmitted Done Reply Inline Actions const qualifying local variables preempts unintended modifications later in the function... but I do not feel strongly about it. I'll change it. etiotto: const qualifying local variables preempts unintended modifications later in the function... but…
				assert(InnerMostLoop != nullptr && "Expecting a valid innermost loop");

				for (BasicBlock *BB : InnerMostLoop->getBlocks()) {
				for (Instruction &I : *BB) {
				if (!isa<StoreInst>(I) && !isa<LoadInst>(I))
				continue;

				std::unique_ptr<IndexedReference> R(new IndexedReference(I, LI, SE));
				if (!R->isValid())
				continue;

				bool Added = false;
				for (auto &RefGroup : RefGroups) {
				const IndexedReference &Representative = *RefGroup->front().get();
				LLVM_DEBUG(dbgs() << "References:\n"; dbgs().indent(2) << *R << "\n";
				dbgs().indent(2) << Representative << "\n");

				if (R->hasTemporalReuse(Representative, *InnerMostLoop, DI, AA) \|\|
				R->hasSpacialReuse(Representative, CLS, AA)) {
				RefGroup->push_back(std::move(R));
				Added = true;
				break;
				}
				}

				if (!Added) {
				std::unique_ptr<ReferenceGroupTy> RG(new ReferenceGroupTy());
				RG->push_back(std::move(R));
				RefGroups.push_back(std::move(RG));
				}
				}
				}
				MeinersburUnsubmitted Done Reply Inline Actions Since this only has code that executes in assert-builds, it should be guarded entirely by an `LLVM_DEBUG`. Meinersbur: Since this only has code that executes in assert-builds, it should be guarded entirely by an…

				if (RefGroups.empty())
				return false;

				LLVM_DEBUG(dbgs() << "\nIDENTIFIED REFERENCE GROUPS:\n";
				unsigned n = 1;
				MeinersburUnsubmitted Done Reply Inline Actions Do you need wrapping behavior? `int` instead of `unsigned`. Meinersbur: Do you need wrapping behavior? `int` instead of `unsigned`.
				for (const auto &RG : RefGroups) {
				dbgs().indent(2) << "RefGroup " << n << ":\n";
				for (const auto &IR : *RG)
				dbgs().indent(4) << *IR << "\n";
				n++;
				}
				dbgs() << "\n");

				return true;
				}

				CacheCostTy
				CacheCost::computeLoopCacheCost(const Loop &L,
				const ReferenceGroupsTy &RefGroups) const {
				if (!L.isLoopSimplifyForm())
				return InvalidCost;

				LLVM_DEBUG(dbgs() << "Considering loop '" << L.getName()
				<< "' as innermost loop.\n");

				// Compute the product of the trip counts of each other loop in the nest.
				CacheCostTy TripCountsProduct = 1;
				for (auto TC : TripCounts) {
				if (TC.first == &L)
				continue;
				TripCountsProduct *= TC.second;
				MeinersburUnsubmitted Done Reply Inline Actions Would it be useful to make `CacheCostTy` an `int64_t`? At least with UBSan we could diagnose an overflow. Meinersbur: Would it be useful to make `CacheCostTy` an `int64_t`? At least with UBSan we could diagnose an…
				etiottoAuthorUnsubmitted Done Reply Inline Actions Ok. etiotto: Ok.
				}

				CacheCostTy LoopCost = 0;
				for (const auto &RG : RefGroups) {
				CacheCostTy RefGroupCost = computeRefGroupCacheCost(*RG, L);
				LoopCost += RefGroupCost * TripCountsProduct;
				}

				LLVM_DEBUG(dbgs().indent(2)
				<< "Loop '" << L.getName() << "' has cost=" << LoopCost << "\n");

				return LoopCost;
				}

				CacheCostTy CacheCost::computeRefGroupCacheCost(const ReferenceGroupTy &RG,
				const Loop &L) const {
				assert(!RG.empty() && "Reference group should have at least one member.");

				const IndexedReference *Representative = RG.front().get();
				return Representative->computeRefCost(L, TTI.getCacheLineSize());
				}

				//===----------------------------------------------------------------------===//
				// LoopCacheAnalysis implementation
				//
				AnalysisKey LoopCacheAnalysis::Key;

				LoopCacheAnalysis::Result
				MeinersburUnsubmitted Done Reply Inline Actions If the intention is to provide analysis for innermost loops only, why this limitation? Could it just return an analysis result for each innermost loop? If the analysis requires a global view to determine the cost for each loop, wouldn't a FunctionPass be more appropriate? Currently, it seems users first need get the LoopCacheAnalysis for a topmost loops, the ask it for one of its nested loops. Are such loop nests not analyzable at all? while (repeat) { for (int i = ...) for (int j = ...) B[i][j] = ... A[i+1][j+1] ... // stencil for (int i = ...) for (int j = ...) A[i][j] = ... B[i+1][j+1] ... // stencil } Meinersbur: If the intention is to provide analysis for innermost loops only, why this limitation? Could it…
				etiottoAuthorUnsubmitted Done Reply Inline Actions The current scope of this PR is to analyze loop nests that have a single innermost loop. The analysis returns a vector of loop costs for each loop in a loop nest. This is not a hard requirement, however I would like to extend the scope of the transformation in a future PR. One of the scenario I had in mind, at least initially, as a consumer of this analysis is loop interchange which operates on perfect nests and therefore the current implementation of the analysis is sufficient for that use. If we were to make the analysis a function pass that would force consumers to become function passes (or use the cached version of this analysis). That seems overly restrictive especially considering that loop interchange is currently a loop pass. etiotto: The current scope of this PR is to analyze loop nests that have a single innermost loop. The…
				fhahnUnsubmitted Done Reply Inline Actions I'll try to have a closer look in a few days, but I think for the use in LoopInterchange, it would not need to be an analysis pass (I suppose there won't be a big benefit to preserve this as an analysis?). I think a lightweight interface to query the cost for certain valid permutations would be sufficient. I think it would be great if we only compute the information when directly required (i.e. we only need to compute the costs for loops we can interchange, for the permutations valid to interchange) fhahn: I'll try to have a closer look in a few days, but I think for the use in LoopInterchange, it…
				etiottoAuthorUnsubmitted Done Reply Inline Actions Looking forward to you comments. You are correct, we could teach loop interchange about the locality of accesses in a loop nest. There are several other loop transformations that can benefit from it. The paper that introduces the analysis (Compiler Optimizations for Improving Data Locality - http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf) illustrates how the analysis was used to guide several other loop transformation, namely loop reversal, loop fusion, and loop distribution. Given the applicability of the analysis to several transformation I think it would make sense to centralize the code as an analysis pass. etiotto: Looking forward to you comments. You are correct, we could teach loop interchange about the…
				fhahnUnsubmitted Done Reply Inline Actions . Given the applicability of the analysis to several transformation I think it would make sense to centralize the code as an analysis pass. I agree it is a useful utility to have, I am just wondering what the benefit of exposing it as an analysis pass would be, as it is unlikely that the result would be used for most loops or could be cached between interested transforms frequently. IMO it would be fine as a utility class/function, i.e. just provide a `getLoopCacheCost()` function that takes a root loop and maybe a potential re-ordering of loops, which the interested transforms can use only on the loops that they can transform. I think that would reduce the size of the patch a bit and focus on the important bits. fhahn: > . Given the applicability of the analysis to several transformation I think it would make…
				etiottoAuthorUnsubmitted Done Reply Inline Actions I was thinking that having it as an analysis pass would allow loop transformations that do not modify the memory references in a loop nest to preserve the analysis... I am OK with just adding a member function to the CacheCost class to construct and return the cache cost for a loop nested root by a given loop. I will upstream a path to make that change and for the time being avoid making this an analysis pass. etiotto: I was thinking that having it as an analysis pass would allow loop transformations that do not…
				MeinersburUnsubmitted Done Reply Inline Actions If we were to make the analysis a function pass that would force consumers to become function passes (or use the cached version of this analysis). That seems overly restrictive especially considering that loop interchange is currently a loop pass. I don't think the new pass manager has this restriction. Passes can ask for analyses of any level using OuterAnalysisManagerProxy. The new pass manager caches everything. Meinersbur: > If we were to make the analysis a function pass that would force consumers to become function…
				etiottoAuthorUnsubmitted Done Reply Inline Actions Yes it is also my understanding (from the comments in PassManager.h) that the new pass manager does allows an "inner" pass (e.g. a Loop Pass) to ask for an "outer" analysis (e.g. a Function Analysis). However, at least from the comments in PassManager.h, the inner pass cannot cause the outer analysis to run and can only rely on the cached version which may give back a nullptr. etiotto: Yes it is also my understanding (from the comments in PassManager.h) that the new pass manager…
				MeinersburUnsubmitted Done Reply Inline Actions I convinced myself that indeed a FunctionPass might not be that nice, because any LoopPass would need to preserve it. As @fhahn points out, wanting to preserve/reuse the analysis might be rare. Meinersbur: I convinced myself that indeed a FunctionPass might not be that nice, because any LoopPass…
				etiottoAuthorUnsubmitted Done Reply Inline Actions I will upstream a path as suggested by @fhahn to just provide a member function to the CacheCost class to construct and return the cache cost for a loop nested rooted by a given loop. etiotto: I will upstream a path as suggested by @fhahn to just provide a member function to the…
				LoopCacheAnalysis::run(Loop &L, LoopAnalysisManager &AM,
				LoopStandardAnalysisResults &AR) {
				// Analyze loop nests in their entirety.
				if (L.getParentLoop() != nullptr)
				return nullptr;

				LoopVectorTy Loops;
				getLoops(L, Loops);
				assert(!Loops.empty() && "Failed to retrieve loops in the nest");

				if (getInnerMostLoop(Loops) == nullptr) {
				LLVM_DEBUG(dbgs() << "Cannot compute cache cost of loop nest with more "
				"than one innermost loop\n");
				return nullptr;
				}

				Function *F = L.getHeader()->getParent();
				DependenceInfo DI(F, &AR.AA, &AR.SE, &AR.LI);

				return make_unique<CacheCost>(Loops, AR.LI, AR.SE, AR.TTI, AR.AA, DI);
				}

				//===----------------------------------------------------------------------===//
				// LoopCachePrinterPass implementation
				//
				PreservedAnalyses LoopCachePrinterPass::run(Loop &L, LoopAnalysisManager &AM,
				LoopStandardAnalysisResults &AR,
				LPMUpdater &U) {
				auto &CacheCost = AM.getResult<LoopCacheAnalysis>(L, AR);
				if (CacheCost != nullptr)
				OS << *CacheCost;

				return PreservedAnalyses::all();
				}

llvm/lib/Passes/PassBuilder.cpp

	Show All 29 Lines
	#include "llvm/Analysis/DemandedBits.h"			#include "llvm/Analysis/DemandedBits.h"
	#include "llvm/Analysis/DependenceAnalysis.h"			#include "llvm/Analysis/DependenceAnalysis.h"
	#include "llvm/Analysis/DominanceFrontier.h"			#include "llvm/Analysis/DominanceFrontier.h"
	#include "llvm/Analysis/GlobalsModRef.h"			#include "llvm/Analysis/GlobalsModRef.h"
	#include "llvm/Analysis/IVUsers.h"			#include "llvm/Analysis/IVUsers.h"
	#include "llvm/Analysis/LazyCallGraph.h"			#include "llvm/Analysis/LazyCallGraph.h"
	#include "llvm/Analysis/LazyValueInfo.h"			#include "llvm/Analysis/LazyValueInfo.h"
	#include "llvm/Analysis/LoopAccessAnalysis.h"			#include "llvm/Analysis/LoopAccessAnalysis.h"
				#include "llvm/Analysis/LoopCacheAnalysis.h"
	#include "llvm/Analysis/LoopInfo.h"			#include "llvm/Analysis/LoopInfo.h"
	#include "llvm/Analysis/MemoryDependenceAnalysis.h"			#include "llvm/Analysis/MemoryDependenceAnalysis.h"
	#include "llvm/Analysis/MemorySSA.h"			#include "llvm/Analysis/MemorySSA.h"
	#include "llvm/Analysis/ModuleSummaryAnalysis.h"			#include "llvm/Analysis/ModuleSummaryAnalysis.h"
	#include "llvm/Analysis/OptimizationRemarkEmitter.h"			#include "llvm/Analysis/OptimizationRemarkEmitter.h"
	#include "llvm/Analysis/PhiValues.h"			#include "llvm/Analysis/PhiValues.h"
	#include "llvm/Analysis/PostDominators.h"			#include "llvm/Analysis/PostDominators.h"
	#include "llvm/Analysis/ProfileSummaryInfo.h"			#include "llvm/Analysis/ProfileSummaryInfo.h"
	▲ Show 20 Lines • Show All 2,251 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 270 Lines • ▼ Show 20 Lines
	#undef FUNCTION_PASS_WITH_PARAMS			#undef FUNCTION_PASS_WITH_PARAMS

	#ifndef LOOP_ANALYSIS			#ifndef LOOP_ANALYSIS
	#define LOOP_ANALYSIS(NAME, CREATE_PASS)			#define LOOP_ANALYSIS(NAME, CREATE_PASS)
	#endif			#endif
	LOOP_ANALYSIS("no-op-loop", NoOpLoopAnalysis())			LOOP_ANALYSIS("no-op-loop", NoOpLoopAnalysis())
	LOOP_ANALYSIS("access-info", LoopAccessAnalysis())			LOOP_ANALYSIS("access-info", LoopAccessAnalysis())
	LOOP_ANALYSIS("ivusers", IVUsersAnalysis())			LOOP_ANALYSIS("ivusers", IVUsersAnalysis())
				LOOP_ANALYSIS("loop-cache-cost", LoopCacheAnalysis())
	LOOP_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))			LOOP_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
	#undef LOOP_ANALYSIS			#undef LOOP_ANALYSIS

	#ifndef LOOP_PASS			#ifndef LOOP_PASS
	#define LOOP_PASS(NAME, CREATE_PASS)			#define LOOP_PASS(NAME, CREATE_PASS)
	#endif			#endif
	LOOP_PASS("invalidate<all>", InvalidateAllAnalysesPass())			LOOP_PASS("invalidate<all>", InvalidateAllAnalysesPass())
	LOOP_PASS("licm", LICMPass())			LOOP_PASS("licm", LICMPass())
	LOOP_PASS("loop-idiom", LoopIdiomRecognizePass())			LOOP_PASS("loop-idiom", LoopIdiomRecognizePass())
	LOOP_PASS("loop-instsimplify", LoopInstSimplifyPass())			LOOP_PASS("loop-instsimplify", LoopInstSimplifyPass())
	LOOP_PASS("rotate", LoopRotatePass())			LOOP_PASS("rotate", LoopRotatePass())
	LOOP_PASS("no-op-loop", NoOpLoopPass())			LOOP_PASS("no-op-loop", NoOpLoopPass())
	LOOP_PASS("print", PrintLoopPass(dbgs()))			LOOP_PASS("print", PrintLoopPass(dbgs()))
	LOOP_PASS("loop-deletion", LoopDeletionPass())			LOOP_PASS("loop-deletion", LoopDeletionPass())
	LOOP_PASS("simplify-cfg", LoopSimplifyCFGPass())			LOOP_PASS("simplify-cfg", LoopSimplifyCFGPass())
	LOOP_PASS("strength-reduce", LoopStrengthReducePass())			LOOP_PASS("strength-reduce", LoopStrengthReducePass())
	LOOP_PASS("indvars", IndVarSimplifyPass())			LOOP_PASS("indvars", IndVarSimplifyPass())
	LOOP_PASS("irce", IRCEPass())			LOOP_PASS("irce", IRCEPass())
	LOOP_PASS("unroll-and-jam", LoopUnrollAndJamPass())			LOOP_PASS("unroll-and-jam", LoopUnrollAndJamPass())
	LOOP_PASS("unroll-full", LoopFullUnrollPass())			LOOP_PASS("unroll-full", LoopFullUnrollPass())
	LOOP_PASS("print-access-info", LoopAccessInfoPrinterPass(dbgs()))			LOOP_PASS("print-access-info", LoopAccessInfoPrinterPass(dbgs()))
	LOOP_PASS("print<ivusers>", IVUsersPrinterPass(dbgs()))			LOOP_PASS("print<ivusers>", IVUsersPrinterPass(dbgs()))
				LOOP_PASS("print<loop-cache-cost>", LoopCachePrinterPass(dbgs()))
	LOOP_PASS("loop-predication", LoopPredicationPass())			LOOP_PASS("loop-predication", LoopPredicationPass())
	LOOP_PASS("guard-widening", GuardWideningPass())			LOOP_PASS("guard-widening", GuardWideningPass())
	#undef LOOP_PASS			#undef LOOP_PASS

	#ifndef LOOP_PASS_WITH_PARAMS			#ifndef LOOP_PASS_WITH_PARAMS
	#define LOOP_PASS_WITH_PARAMS(NAME, CREATE_PASS, PARSER)			#define LOOP_PASS_WITH_PARAMS(NAME, CREATE_PASS, PARSER)
	#endif			#endif
	LOOP_PASS_WITH_PARAMS("unswitch",			LOOP_PASS_WITH_PARAMS("unswitch",
	[](bool NonTrivial) {			[](bool NonTrivial) {
	return SimpleLoopUnswitchPass(NonTrivial);			return SimpleLoopUnswitchPass(NonTrivial);
	},			},
	parseLoopUnswitchOptions)			parseLoopUnswitchOptions)
	#undef LOOP_PASS_WITH_PARAMS			#undef LOOP_PASS_WITH_PARAMS

llvm/test/Analysis/LoopCacheAnalysis/loads-store.ll

This file was added.

				; RUN: opt < %s -passes='print<loop-cache-cost>' -disable-output 2>&1 \| FileCheck %s

				greenedUnsubmitted Not Done Reply Inline Actions I'd like to see block comments at the top of all of these tests explaining what they are testing. It will be much easier to understand what's going on when these tests fail. greened: I'd like to see block comments at the top of all of these tests explaining what they are…
				target datalayout = "e-m:e-i64:64-n32:64"
				target triple = "powerpc64le-unknown-linux-gnu"

				fhahnUnsubmitted Done Reply Inline Actions We need to make sure that the PPC backend was built when running those tests. usually we put target-specific tests in target subdirectories, with a local lit.local.cfg, checking for the backend (e.g. see llvm/test/Transforms/LoopVectorize/PowerPC/lit.local.cfg) fhahn: We need to make sure that the PPC backend was built when running those tests. usually we put…
				; void foo(long n, long m, long o, int A[n][m][o], int B[n][m][o], int C[n][m][o]) {
				; for (long i = 0; i < n; i++)
				; for (long j = 0; j < m; j++)
				; for (long k = 0; k < o; k++)
				; A[i][k][j] += B[i][k][j] + C[i][j][k];
				; }

				; CHECK-DAG:Loop 'for.i' has cost = 3000000
				; CHECK-DAG:Loop 'for.k' has cost = 2030000
				; CHECK-DAG:Loop 'for.j' has cost = 1060000

				define void @foo(i64 %n, i64 %m, i64 %o, i32* %A, i32* %B, i32* %C) {
				entry:
				%cmp32 = icmp sgt i64 %n, 0
				%cmp230 = icmp sgt i64 %m, 0
				%cmp528 = icmp sgt i64 %o, 0
				br i1 %cmp32, label %for.cond1.preheader.lr.ph, label %for.end

				for.cond1.preheader.lr.ph: ; preds = %entry
				br i1 %cmp230, label %for.i.preheader, label %for.end

				for.i.preheader: ; preds = %for.cond1.preheader.lr.ph
				br i1 %cmp528, label %for.i.preheader.split, label %for.end

				for.i.preheader.split: ; preds = %for.i.preheader
				br label %for.i

				for.i: ; preds = %for.inci, %for.i.preheader.split
				%i = phi i64 [ %inci, %for.inci ], [ 0, %for.i.preheader.split ]
				%muli = mul i64 %i, %m
				br label %for.j

				for.j: ; preds = %for.incj, %for.i
				%j = phi i64 [ %incj, %for.incj ], [ 0, %for.i ]
				%addj = add i64 %muli, %j
				%mulj = mul i64 %addj, %o
				br label %for.k

				for.k: ; preds = %for.k, %for.j
				%k = phi i64 [ 0, %for.j ], [ %inck, %for.k ]

				; B[i][k][j]
				%addk = add i64 %muli, %k
				%mulk = mul i64 %addk, %o
				%arrayidx1 = add i64 %j, %mulk
				%arrayidx2 = getelementptr inbounds i32, i32* %B, i64 %arrayidx1
				%elem_B = load i32, i32* %arrayidx2, align 4

				; C[i][j][k]
				%arrayidx3 = add i64 %k, %mulj
				%arrayidx4 = getelementptr inbounds i32, i32* %C, i64 %arrayidx3
				%elem_C = load i32, i32* %arrayidx4, align 4

				; A[i][k][j]
				%arrayidx5 = getelementptr inbounds i32, i32* %A, i64 %arrayidx1
				%elem_A = load i32, i32* %arrayidx5, align 4

				; A[i][k][j] += B[i][k][j] + C[i][j][k]
				%add1 = add i32 %elem_B, %elem_C
				%add2 = add i32 %add1, %elem_A
				%arrayidx6 = getelementptr inbounds i32, i32* %A, i64 %arrayidx1
				store i32 %add2, i32* %arrayidx6, align 4

				%inck = add nsw i64 %k, 1
				%exitcond.us = icmp eq i64 %inck, %o
				br i1 %exitcond.us, label %for.incj, label %for.k

				for.incj: ; preds = %for.k
				%incj = add nsw i64 %j, 1
				%exitcond54.us = icmp eq i64 %incj, %m
				br i1 %exitcond54.us, label %for.inci, label %for.j

				for.inci: ; preds = %for.incj
				%inci = add nsw i64 %i, 1
				%exitcond55.us = icmp eq i64 %inci, %n
				br i1 %exitcond55.us, label %for.end.loopexit, label %for.i

				for.end.loopexit: ; preds = %for.inci
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %for.cond1.preheader.lr.ph, %entry
				ret void
				}

llvm/test/Analysis/LoopCacheAnalysis/matmul.ll

This file was added.

				; RUN: opt < %s -passes='print<loop-cache-cost>' -disable-output 2>&1 \| FileCheck %s

				target datalayout = "e-m:e-i64:64-n32:64"
				target triple = "powerpc64le-unknown-linux-gnu"

				; void matmul(long n, long m, long o, int A[n][m], int B[n][m], int C[n]) {
				; for (long i = 0; i < n; i++)
				; for (long j = 0; j < m; j++)
				; for (long k = 0; k < o; k++)
				; C[i][j] = C[i][j] + A[i][k] * B[k][j];
				; }

				; CHECK-DAG:Loop 'for.i' has cost = 2010000
				; CHECK-DAG:Loop 'for.j' has cost = 70000
				; CHECK-DAG:Loop 'for.k' has cost = 1040000

				define void @matmul(i64 %n, i64 %m, i64 %o, i32* %A, i32* %B, i32* %C) {
				entry:
				br label %for.i

				for.i: ; preds = %entry, %for.inc.i
				%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc.i ]
				%muli = mul i64 %i, %m
				br label %for.j

				for.j: ; preds = %for.i, %for.inc.j
				%j = phi i64 [ 0, %for.i ], [ %j.next, %for.inc.j ]
				%addj = add i64 %muli, %j
				%mulj = mul i64 %addj, %o
				br label %for.k

				for.k: ; preds = %for.j, %for.inc.k
				%k = phi i64 [ 0, %for.j ], [ %k.next, %for.inc.k ]

				; A[i][k]
				%arrayidx3 = add i64 %k, %muli
				%arrayidx4 = getelementptr inbounds i32, i32* %A, i64 %arrayidx3
				%elem_A = load i32, i32* %arrayidx4, align 4

				; B[k][j]
				%mulk = mul i64 %k, %o
				%arrayidx5 = add i64 %j, %mulk
				%arrayidx6 = getelementptr inbounds i32, i32* %B, i64 %arrayidx5
				%elem_B = load i32, i32* %arrayidx6, align 4

				; C[i][k]
				%arrayidx7 = add i64 %j, %muli
				%arrayidx8 = getelementptr inbounds i32, i32* %C, i64 %arrayidx7
				%elem_C = load i32, i32* %arrayidx8, align 4

				; C[i][j] = C[i][j] + A[i][k] * B[k][j];
				%mul = mul nsw i32 %elem_A, %elem_B
				%add = add nsw i32 %elem_C, %mul
				store i32 %add, i32* %arrayidx8, align 4

				br label %for.inc.k

				for.inc.k: ; preds = %for.k
				%k.next = add nuw nsw i64 %k, 1
				%exitcond = icmp ne i64 %k.next, %o
				br i1 %exitcond, label %for.k, label %for.end

				for.end: ; preds = %for.inc
				br label %for.inc.j

				for.inc.j: ; preds = %for.end
				%j.next = add nuw nsw i64 %j, 1
				%exitcond5 = icmp ne i64 %j.next, %m
				br i1 %exitcond5, label %for.j, label %for.end23

				for.end23: ; preds = %for.inc.j
				br label %for.inc.i

				for.inc.i: ; preds = %for.end23
				%i.next = add nuw nsw i64 %i, 1
				%exitcond8 = icmp ne i64 %i.next, %n
				br i1 %exitcond8, label %for.i, label %for.end26

				for.end26: ; preds = %for.inc.i
				ret void
				}

llvm/test/Analysis/LoopCacheAnalysis/matvecmul.ll

This file was added.

				; RUN: opt < %s -passes='print<loop-cache-cost>' -disable-output 2>&1 \| FileCheck %s

				target datalayout = "e-m:e-i64:64-n32:64"
				target triple = "powerpc64le-unknown-linux-gnu"

				; void matvecmul(const double __restrict y, const double __restrict x, const double * __restrict b,
				; const int * __restrict nb, const int * __restrict nx, const int * __restrict ny, const int * __restrict nz) {
				;
				; for (int k=1;k<nz,++k)
				; for (int j=1;j<ny,++j)
				; for (int i=1;i<nx,++i)
				; for (int l=1;l<nb,++l)
				; for (int m=1;m<nb,++m)
				; y[k+1][j][i][l] = y[k+1][j][i][l] + b[k][j][i][m][l]*x[k][j][i][m]
				; }

				; CHECK-DAG: Loop 'k_loop' has cost = 30000000000
				; CHECK-DAG: Loop 'j_loop' has cost = 30000000000
				; CHECK-DAG: Loop 'i_loop' has cost = 30000000000
				; CHECK-DAG: Loop 'm_loop' has cost = 10700000000
				; CHECK-DAG: Loop 'l_loop' has cost = 1300000000

				%_elem_type_of_double = type <{ double }>

				; Function Attrs: norecurse nounwind
				define void @mat_vec_mpy([0 x %_elem_type_of_double]* noalias %y, [0 x %_elem_type_of_double]* noalias readonly %x,
				[0 x %_elem_type_of_double]* noalias readonly %b, i32* noalias readonly %nb, i32* noalias readonly %nx,
				i32* noalias readonly %ny, i32* noalias readonly %nz) {
				mat_times_vec_entry:
				%_ind_val = load i32, i32* %nb, align 4
				%_conv = sext i32 %_ind_val to i64
				%_grt_tmp.i = icmp sgt i64 %_conv, 0
				%a_b.i = select i1 %_grt_tmp.i, i64 %_conv, i64 0
				%_ind_val1 = load i32, i32* %nx, align 4
				%_conv2 = sext i32 %_ind_val1 to i64
				%_grt_tmp.i266 = icmp sgt i64 %_conv2, 0
				%a_b.i267 = select i1 %_grt_tmp.i266, i64 %_conv2, i64 0
				%_ind_val3 = load i32, i32* %ny, align 4
				%_conv4 = sext i32 %_ind_val3 to i64
				%_grt_tmp.i264 = icmp sgt i64 %_conv4, 0
				%a_b.i265 = select i1 %_grt_tmp.i264, i64 %_conv4, i64 0
				%_ind_val5 = load i32, i32* %nz, align 4
				%_mult_tmp = shl nsw i64 %a_b.i, 3
				%_mult_tmp7 = mul i64 %_mult_tmp, %a_b.i267
				%_mult_tmp8 = mul i64 %_mult_tmp7, %a_b.i265
				%_sub_tmp = sub nuw nsw i64 -8, %_mult_tmp
				%_sub_tmp21 = sub i64 %_sub_tmp, %_mult_tmp7
				%_sub_tmp23 = sub i64 %_sub_tmp21, %_mult_tmp8
				%_mult_tmp73 = mul i64 %_mult_tmp, %a_b.i
				%_mult_tmp74 = mul i64 %_mult_tmp73, %a_b.i267
				%_mult_tmp75 = mul i64 %_mult_tmp74, %a_b.i265
				%_sub_tmp93 = sub i64 %_sub_tmp, %_mult_tmp73
				%_sub_tmp95 = sub i64 %_sub_tmp93, %_mult_tmp74
				%_sub_tmp97 = sub i64 %_sub_tmp95, %_mult_tmp75
				%_grt_tmp853288 = icmp slt i32 %_ind_val5, 1
				br i1 %_grt_tmp853288, label %_return_bb, label %k_loop.lr.ph

				k_loop.lr.ph: ; preds = %mat_times_vec_entry
				%_grt_tmp851279 = icmp slt i32 %_ind_val3, 1
				%_grt_tmp847270 = icmp slt i32 %_ind_val, 1
				%_aa_conv = bitcast [0 x %_elem_type_of_double]* %y to i8*
				%_adda_ = getelementptr inbounds i8, i8* %_aa_conv, i64 %_sub_tmp23
				%_aa_conv434 = bitcast [0 x %_elem_type_of_double]* %x to i8*
				%_adda_435 = getelementptr inbounds i8, i8* %_aa_conv434, i64 %_sub_tmp23
				%_aa_conv785 = bitcast [0 x %_elem_type_of_double]* %b to i8*
				%_adda_786 = getelementptr inbounds i8, i8* %_aa_conv785, i64 %_sub_tmp97
				br i1 %_grt_tmp851279, label %k_loop.us.preheader, label %k_loop.lr.ph.split

				k_loop.us.preheader: ; preds = %k_loop.lr.ph
				br label %_return_bb.loopexit

				k_loop.lr.ph.split: ; preds = %k_loop.lr.ph
				%_grt_tmp849273 = icmp slt i32 %_ind_val1, 1
				br i1 %_grt_tmp849273, label %k_loop.us291.preheader, label %k_loop.lr.ph.split.split

				k_loop.us291.preheader: ; preds = %k_loop.lr.ph.split
				br label %_return_bb.loopexit300

				k_loop.lr.ph.split.split: ; preds = %k_loop.lr.ph.split
				br i1 %_grt_tmp847270, label %k_loop.us294.preheader, label %k_loop.preheader

				k_loop.preheader: ; preds = %k_loop.lr.ph.split.split
				%0 = add i32 %_ind_val, 1
				%1 = add i32 %_ind_val1, 1
				%2 = add i32 %_ind_val3, 1
				%3 = add i32 %_ind_val5, 1
				br label %k_loop

				k_loop.us294.preheader: ; preds = %k_loop.lr.ph.split.split
				br label %_return_bb.loopexit301

				k_loop: ; preds = %k_loop._label_18_crit_edge.split.split.split, %k_loop.preheader
				%indvars.iv316 = phi i64 [ 1, %k_loop.preheader ], [ %indvars.iv.next317, %k_loop._label_18_crit_edge.split.split.split ]
				%indvars.iv.next317 = add nuw nsw i64 %indvars.iv316, 1
				%_ix_x_len = mul i64 %_mult_tmp8, %indvars.iv.next317
				%_ix_x_len410 = mul i64 %_mult_tmp75, %indvars.iv316
				%_ix_x_len822 = mul i64 %_mult_tmp8, %indvars.iv316
				br label %j_loop

				j_loop: ; preds = %j_loop._label_15_crit_edge.split.split, %k_loop
				%indvars.iv312 = phi i64 [ %indvars.iv.next313, %j_loop._label_15_crit_edge.split.split ], [ 1, %k_loop ]
				%_ix_x_len371 = mul i64 %_mult_tmp7, %indvars.iv312
				%_ix_x_len415 = mul i64 %_mult_tmp74, %indvars.iv312
				br label %i_loop

				i_loop: ; preds = %i_loop._label_12_crit_edge.split, %j_loop
				%indvars.iv307 = phi i64 [ %indvars.iv.next308, %i_loop._label_12_crit_edge.split ], [ 1, %j_loop ]
				%_ix_x_len375 = mul i64 %_mult_tmp, %indvars.iv307
				%_ix_x_len420 = mul i64 %_mult_tmp73, %indvars.iv307
				br label %l_loop

				l_loop: ; preds = %l_loop._label_9_crit_edge, %i_loop
				%indvars.iv303 = phi i64 [ %indvars.iv.next304, %l_loop._label_9_crit_edge ], [ 1, %i_loop ]
				%_ix_x_len378 = shl nuw nsw i64 %indvars.iv303, 3
				br label %m_loop

				m_loop: ; preds = %m_loop, %l_loop
				%indvars.iv = phi i64 [ %indvars.iv.next, %m_loop ], [ 1, %l_loop ]
				%_ix_x_len424 = mul i64 %_mult_tmp, %indvars.iv
				%_ix_x_len454 = shl nuw nsw i64 %indvars.iv, 3
				%_ixa_gep = getelementptr inbounds i8, i8* %_adda_, i64 %_ix_x_len
				%_ixa_gep791 = getelementptr inbounds i8, i8* %_adda_786, i64 %_ix_x_len410
				%_ixa_gep823 = getelementptr inbounds i8, i8* %_adda_435, i64 %_ix_x_len822
				%_ixa_gep372 = getelementptr inbounds i8, i8* %_ixa_gep, i64 %_ix_x_len371
				%_ixa_gep376 = getelementptr inbounds i8, i8* %_ixa_gep372, i64 %_ix_x_len375
				%_ixa_gep796 = getelementptr inbounds i8, i8* %_ixa_gep791, i64 %_ix_x_len415
				%_ixa_gep828 = getelementptr inbounds i8, i8* %_ixa_gep823, i64 %_ix_x_len371
				%_ixa_gep379 = getelementptr inbounds i8, i8* %_ixa_gep376, i64 %_ix_x_len378
				%_ixa_gep801 = getelementptr inbounds i8, i8* %_ixa_gep796, i64 %_ix_x_len420
				%_ixa_gep833 = getelementptr inbounds i8, i8* %_ixa_gep828, i64 %_ix_x_len375
				%_ixa_gep806 = getelementptr inbounds i8, i8* %_ixa_gep801, i64 %_ix_x_len378
				%_ixa_gep810 = getelementptr inbounds i8, i8* %_ixa_gep806, i64 %_ix_x_len424
				%_gepp = bitcast i8* %_ixa_gep379 to double*
				%_gepp813 = bitcast i8* %_ixa_gep810 to double*
				%_ind_val814 = load double, double* %_gepp813, align 8
				%_ixa_gep837 = getelementptr inbounds i8, i8* %_ixa_gep833, i64 %_ix_x_len454
				%_gepp840 = bitcast i8* %_ixa_gep837 to double*
				%_ind_val841 = load double, double* %_gepp840, align 8
				%_mult_tmp842 = fmul double %_ind_val814, %_ind_val841
				store double %_mult_tmp842, double* %_gepp, align 8
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%wide.trip.count = zext i32 %0 to i64
				%wide.trip.count305 = zext i32 %0 to i64
				%wide.trip.count309 = zext i32 %1 to i64
				%wide.trip.count314 = zext i32 %2 to i64
				%wide.trip.count319 = zext i32 %3 to i64
				%exitcond = icmp ne i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond, label %m_loop, label %l_loop._label_9_crit_edge

				l_loop._label_9_crit_edge: ; preds = %m_loop
				%indvars.iv.next304 = add nuw nsw i64 %indvars.iv303, 1
				%exitcond306 = icmp ne i64 %indvars.iv.next304, %wide.trip.count305
				br i1 %exitcond306, label %l_loop, label %i_loop._label_12_crit_edge.split

				i_loop._label_12_crit_edge.split: ; preds = %l_loop._label_9_crit_edge
				%indvars.iv.next308 = add nuw nsw i64 %indvars.iv307, 1
				%exitcond310 = icmp ne i64 %indvars.iv.next308, %wide.trip.count309
				br i1 %exitcond310, label %i_loop, label %j_loop._label_15_crit_edge.split.split

				j_loop._label_15_crit_edge.split.split: ; preds = %i_loop._label_12_crit_edge.split
				%indvars.iv.next313 = add nuw nsw i64 %indvars.iv312, 1
				%exitcond315 = icmp ne i64 %indvars.iv.next313, %wide.trip.count314
				br i1 %exitcond315, label %j_loop, label %k_loop._label_18_crit_edge.split.split.split

				k_loop._label_18_crit_edge.split.split.split: ; preds = %j_loop._label_15_crit_edge.split.split
				%exitcond320 = icmp ne i64 %indvars.iv.next317, %wide.trip.count319
				br i1 %exitcond320, label %k_loop, label %_return_bb.loopexit302

				_return_bb.loopexit: ; preds = %k_loop.us.preheader
				br label %_return_bb

				_return_bb.loopexit300: ; preds = %k_loop.us291.preheader
				br label %_return_bb

				_return_bb.loopexit301: ; preds = %k_loop.us294.preheader
				br label %_return_bb

				_return_bb.loopexit302: ; preds = %k_loop._label_18_crit_edge.split.split.split
				br label %_return_bb

				_return_bb: ; preds = %_return_bb.loopexit302, %_return_bb.loopexit301, %_return_bb.loopexit300, %_return_bb.loopexit, %mat_times_vec_entry
				ret void
				}

llvm/test/Analysis/LoopCacheAnalysis/single-store.ll

This file was added.

				; RUN: opt < %s -passes='print<loop-cache-cost>' -disable-output 2>&1 \| FileCheck %s

				target datalayout = "e-m:e-i64:64-n32:64"
				target triple = "powerpc64le-unknown-linux-gnu"

				; void foo(long n, long m, long o, int A[n][m][o]) {
				; for (long i = 0; i < n; i++)
				; for (long j = 0; j < m; j++)
				; for (long k = 0; k < o; k++)
				; A[2i+3][3j-4][2*k+7] = 1;
				; }

				; CHECK-DAG: Loop 'for.i' has cost = 1000000
				; CHECK-DAG: Loop 'for.j' has cost = 1000000
				; CHECK-DAG: Loop 'for.k' has cost = 60000

				define void @foo(i64 %n, i64 %m, i64 %o, i32* %A) {
				entry:
				%cmp32 = icmp sgt i64 %n, 0
				%cmp230 = icmp sgt i64 %m, 0
				%cmp528 = icmp sgt i64 %o, 0
				br i1 %cmp32, label %for.cond1.preheader.lr.ph, label %for.end

				for.cond1.preheader.lr.ph: ; preds = %entry
				br i1 %cmp230, label %for.i.preheader, label %for.end

				for.i.preheader: ; preds = %for.cond1.preheader.lr.ph
				br i1 %cmp528, label %for.i.preheader.split, label %for.end

				for.i.preheader.split: ; preds = %for.i.preheader
				br label %for.i

				for.i: ; preds = %for.inci, %for.i.preheader.split
				%i = phi i64 [ %inci, %for.inci ], [ 0, %for.i.preheader.split ]
				%mul8 = shl i64 %i, 1
				%add9 = add nsw i64 %mul8, 3
				%0 = mul i64 %add9, %m
				%sub = add i64 %0, -4
				br label %for.j

				for.j: ; preds = %for.incj, %for.i
				%j = phi i64 [ %incj, %for.incj ], [ 0, %for.i ]
				%mul7 = mul nsw i64 %j, 3
				%tmp = add i64 %sub, %mul7
				%tmp27 = mul i64 %tmp, %o
				br label %for.k

				for.k: ; preds = %for.k, %for.j.us
				%k = phi i64 [ 0, %for.j ], [ %inck, %for.k ]

				%mul = mul nsw i64 %k, 2
				%arrayidx.sum = add i64 %mul, 7
				%arrayidx10.sum = add i64 %arrayidx.sum, %tmp27
				%arrayidx11 = getelementptr inbounds i32, i32* %A, i64 %arrayidx10.sum
				store i32 1, i32* %arrayidx11, align 4

				%inck = add nsw i64 %k, 1
				%exitcond.us = icmp eq i64 %inck, %o
				br i1 %exitcond.us, label %for.incj, label %for.k

				for.incj: ; preds = %for.k
				%incj = add nsw i64 %j, 1
				%exitcond54.us = icmp eq i64 %incj, %m
				br i1 %exitcond54.us, label %for.inci, label %for.j

				for.inci: ; preds = %for.incj
				%inci = add nsw i64 %i, 1
				%exitcond55.us = icmp eq i64 %inci, %n
				br i1 %exitcond55.us, label %for.end.loopexit, label %for.i

				for.end.loopexit: ; preds = %for.inci
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %for.cond1.preheader.lr.ph, %entry
				ret void
				}

llvm/test/Analysis/LoopCacheAnalysis/stencil.ll

This file was added.

				; RUN: opt < %s -passes='print<loop-cache-cost>' -disable-output 2>&1 \| FileCheck %s

				target datalayout = "e-m:e-i64:64-n32:64"
				target triple = "powerpc64le-unknown-linux-gnu"

				; void foo(long n, long m, long o, int A[n][m], int B[n][m], int C[n]) {
				; for (long i = 0; i < n; i++)
				; for (long j = 0; j < m; j++) {
				; A[i][j] = A[i][j+1] + B[i-1][j] + B[i+1][j+1] + C[i];
				; A[i][j] += B[i][i];
				; }
				; }

				; CHECK-DAG: Loop 'for.i' has cost = 20600
				; CHECK-DAG: Loop 'for.j' has cost = 800

				define void @foo(i64 %n, i64 %m, i32* %A, i32* %B, i32* %C) {
				entry:
				%cmp32 = icmp sgt i64 %n, 0
				%cmp230 = icmp sgt i64 %m, 0
				br i1 %cmp32, label %for.cond1.preheader.lr.ph, label %for.end

				for.cond1.preheader.lr.ph: ; preds = %entry
				br i1 %cmp230, label %for.i.preheader, label %for.end

				for.i.preheader: ; preds = %for.cond1.preheader.lr.ph
				br label %for.i

				for.i: ; preds = %for.inci, %for.i.preheader.split
				%i = phi i64 [ %inci, %for.inci ], [ 0, %for.i.preheader ]
				%subione = sub i64 %i, 1
				%addione = add i64 %i, 1
				%muli = mul i64 %i, %m
				%muliminusone = mul i64 %subione, %m
				%muliplusone = mul i64 %addione, %m
				br label %for.j

				for.j: ; preds = %for.incj, %for.i
				%j = phi i64 [ %incj, %for.incj ], [ 0, %for.i ]
				%addj = add i64 %muli, %j

				; B[i-1][j]
				%arrayidx1 = add i64 %j, %muliminusone
				%arrayidx2 = getelementptr inbounds i32, i32* %B, i64 %arrayidx1
				%elem_B1 = load i32, i32* %arrayidx2, align 4

				; B[i-1][j+1]
				%addjone = add i64 %j, 1
				%arrayidx3 = add i64 %addjone, %muliminusone
				%arrayidx4 = getelementptr inbounds i32, i32* %B, i64 %arrayidx3
				%elem_B2 = load i32, i32* %arrayidx4, align 4

				; C[i]
				%arrayidx6 = getelementptr inbounds i32, i32* %C, i64 %i
				%elem_C = load i32, i32* %arrayidx6, align 4

				; A[i][j+1]
				%arrayidx7 = add i64 %addjone, %muli
				%arrayidx8 = getelementptr inbounds i32, i32* %A, i64 %arrayidx7
				%elem_A = load i32, i32* %arrayidx8, align 4

				; A[i][j] = A[i][j+1] + B[i-1][j] + B[i-1][j+1] + C[i]
				%addB = add i32 %elem_B1, %elem_B2
				%addC = add i32 %addB, %elem_C
				%addA = add i32 %elem_A, %elem_C
				%arrayidx9 = add i64 %j, %muli
				%arrayidx10 = getelementptr inbounds i32, i32* %A, i64 %arrayidx9
				store i32 %addA, i32* %arrayidx10, align 4

				; A[i][j] += B[i][i];
				%arrayidx11 = add i64 %j, %muli
				%arrayidx12 = getelementptr inbounds i32, i32* %A, i64 %arrayidx11
				%elem_A1 = load i32, i32* %arrayidx12, align 4
				%arrayidx13 = add i64 %i, %muli
				%arrayidx14 = getelementptr inbounds i32, i32* %B, i64 %arrayidx13
				%elem_B3 = load i32, i32* %arrayidx14, align 4
				%addA1 = add i32 %elem_A1, %elem_B3
				store i32 %addA1, i32* %arrayidx12, align 4

				br label %for.incj

				for.incj: ; preds = %for.j
				%incj = add nsw i64 %j, 1
				%exitcond54.us = icmp eq i64 %incj, %m
				br i1 %exitcond54.us, label %for.inci, label %for.j

				for.inci: ; preds = %for.incj
				%inci = add nsw i64 %i, 1
				%exitcond55.us = icmp eq i64 %inci, %n
				br i1 %exitcond55.us, label %for.end.loopexit, label %for.i

				for.end.loopexit: ; preds = %for.inci
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %for.cond1.preheader.lr.ph, %entry
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Loop Cache AnalysisClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 205434

llvm/include/llvm/Analysis/LoopCacheAnalysis.h

llvm/lib/Analysis/CMakeLists.txt

llvm/lib/Analysis/LoopCacheAnalysis.cpp

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassRegistry.def

llvm/test/Analysis/LoopCacheAnalysis/loads-store.ll

llvm/test/Analysis/LoopCacheAnalysis/matmul.ll

llvm/test/Analysis/LoopCacheAnalysis/matvecmul.ll

llvm/test/Analysis/LoopCacheAnalysis/single-store.ll

llvm/test/Analysis/LoopCacheAnalysis/stencil.ll

Loop Cache Analysis
ClosedPublic