Page MenuHomePhabricator

Cache aware Loop Cost Analysis
Needs RevisionPublic

Authored by tvvikram on Jun 8 2016, 12:57 AM.

Details

Summary

Implement an analysis pass that calculates loop cost based on cache data.

The patch basically creates groups of references that would lie in the same cache line. Each group is then analysed with respect to innermost loops considering cache lines. Penalty for the reference is:
a. 1, if the reference is invariant with the innermost loop,
b. TripCount for non-unit stride access,
c. TripCount / CacheLineSize for a unit-stride access.
Loop Cost is then calculated as the sum of the reference penalties times the product of the loop bounds of the outer loops. This loop cost can then be used as a profitability measure for cache reuse related optimizations. This is just a brief description; please refer to http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf for the details.

Current drawbacks:
a. Static use of CacheLineSize.
b. Only perfect nests are handled.
c. Only single bb of innermost loop is considered.
d. Add reg data and other cost related information if possible.
e. Strides <= CLS belong to same ref. group.

Diff Detail

Event Timeline

tvvikram updated this revision to Diff 60008.Jun 8 2016, 12:57 AM
tvvikram retitled this revision from to Cache aware Loop Cost Analysis.
tvvikram updated this object.
tvvikram updated this object.Jun 9 2016, 9:36 AM
aprantl added a subscriber: aprantl.Jun 9 2016, 1:01 PM
suyog added a subscriber: suyog.Jun 29 2016, 2:21 PM
jfb edited edge metadata.Jun 29 2016, 4:09 PM

Quick review, nothing technical.

include/llvm/Analysis/LoopCostAnalysis.h
43

Typo "resue"

46

Typo "implemenation"

50

enum class is nicer IMO. You can also rename it to Way instead of CacheWay.

67

This ctor isn't used. It looks like you could get rid of the default ctor and initCacheData if you have access to the target at construction time (do you? I'm not sure, it would be good to wire that up).

If you keep this ctor, it should be written as:

CacheData(unsigned LineSize, unsigned CacheSize,
          CacheData::CacheWay Associativity)
    : LineSize(LineSize), CacheSize(CacheSize), Associativity(Associativity) {}

You can't use identifiers starting with _[A-Z], and you can reuse the same name.

77

The three setters above aren't used, and don't seem useful given that you have initCacheData.

82

Could you address this TODO? That's the part I'm most interested in :)
That could be a separate patch if you want to keep things minimal.

91

LLVM doesn't usually name things with "_t".

97

enum class here as well.

100

Column major is never used in the code. Is it going to be added later?

116

Weird format here. Could you run new code through clang-format?

116

Wouldn't float be sufficient for this purpose?

lib/Analysis/LoopCostAnalysis.cpp
22

?

35

Use a range-based for loop here. There are other places below where you should also use them.

161

Use a constexpr for this, or a cl::opt if you want to be able to set it from the command line.

165

You can push_back({L, TripCount})

193

"its access"

203

Delete?

204

Naming isn't consistent with other parts for the LLVM code (I and BB), which you seem to try to follow in general (e.g. GEP).

test/Analysis/CostModel/X86/matmul-perfect-loopCost.ll
23

Is your algorithm stable enough that every platform will produce exactly the same double result? You may want to print out the double differently so it's easier to ignore bits of lower significance.

In D21124#470684, @jfb wrote:

Quick review, nothing technical.

Thanks! Will address your comments, but few inline comments.

include/llvm/Analysis/LoopCostAnalysis.h
82

I will work on this and submit a separate patch.

91

"LoopNestType" is better?

100

It is used in LoopCostAnalysis.cpp:334. Can GEP represent column major accesses? If ColumnMajor producers like Fortran frontend convert column major accesses to row major accesses, then the entire enum Order will not be necessary.

116

For a loop with trip count 10^6 and a deep nesting of say 5 level nest, the algorithm computes a penalty of (10^6)^5 = 10^30 for the outermost level loop. Since it can easily reach FLOAT_MAX, I had kept it double.

lib/Analysis/LoopCostAnalysis.cpp
22

LoopCost computation works on loop nests. Some of the code below populates perfect nests from LoopInfo. FWIW, I think it can be moved to LoopUtils, so that other optimizations can reuse it.

test/Analysis/CostModel/X86/matmul-perfect-loopCost.ll
23

LSBs can vary, I guess. Will shorten to 4 decimal places.

hfinkel edited edge metadata.Oct 12 2016, 9:38 AM

Vikram, are you still interested in working on this?

Vikram, are you still interested in working on this?

Yes. I will update the patch in a few days.

jfb requested changes to this revision.May 7 2018, 10:14 PM

Subsumed by https://llvm.org/docs/CommandGuide/llvm-mca.html ? Can we close this?

This revision now requires changes to proceed.May 7 2018, 10:14 PM

llvm-mca doesn't subsume this patch.

jfb added a comment.May 8 2018, 8:43 AM

llvm-mca doesn't subsume this patch.

Fair. Is this patch going somewhere, or is it abandoned?

In D21124#1091393, @jfb wrote:

llvm-mca doesn't subsume this patch.

Fair. Is this patch going somewhere, or is it abandoned?

I am interested (but someone should fund!).

jfb resigned from this revision.May 8 2018, 9:14 AM

Doesn't sound like I should expect progress for now. Happy to help review if something changes.