As described in https://reviews.llvm.org/D122776, the current LoopCacheCost analysis is not able to determine profitable nesting for outer loops of a nest more than 2 levels deep. For example consider the first loop in the existing LIT test llvm/test/Analysis/LoopCacheAnalysis/PowerPC/matvecmul.ll. The loop looks like this:
; for (int k=1;k<nz,++k) ; for (int j=1;j<ny,++j) ; for (int i=1;i<nx,++i) ; for (int l=1;l<nb,++l) ; for (int m=1;m<nb,++m) ; y[k+1][j][i][l] = y[k+1][j][i][l] + b[k][j][i][m][l]*x[k][j][i][m]
and the cost for the k, j and i loops are all calculated to be 30000000000. The problem is that when considering a subject loop as the inner most loop of the nest, if the access pattern is not consecutive, the cost function returns the trip count of that loop as the estimated number of cache lines accessed. If the trip counts are equal (or unknown in which case we assume a default value of 100) then the costs of the outer loops will be identical. The cost function needs to give more weight to the reference groups that use a function of the loop's IV as a subscript into outer dimensions. This patch tries to do that by multiplying the trip counts of the loops corresponding to subscripts that come between the subject loop and inner dimensions. (ie for a given reference group, the farther away a subscript from the innermost level, the higher the cost of moving the corresponding loop into the innermost position).