Implement an analysis pass that calculates loop cost based on cache data.
The patch basically creates groups of references that would lie in the same cache line. Each group is then analysed with respect to innermost loops considering cache lines. Penalty for the reference is:
a. 1, if the reference is invariant with the innermost loop,
b. TripCount for non-unit stride access,
c. TripCount / CacheLineSize for a unit-stride access.
Loop Cost is then calculated as the sum of the reference penalties times the product of the loop bounds of the outer loops. This loop cost can then be used as a profitability measure for cache reuse related optimizations. This is just a brief description; please refer to http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf for the details.
Current drawbacks:
a. Static use of CacheLineSize.
b. Only perfect nests are handled.
c. Only single bb of innermost loop is considered.
d. Add reg data and other cost related information if possible.
e. Strides <= CLS belong to same ref. group.
Typo "resue"