This patch updates the cost function to use the number of bytes that fit
into cache lines to evaluate if it is beneficial to interchange loops.
Loops are interchanged if more bytes fit into cache lines if we swap the
order.
By operating on the pointer operand of load and store instructions and
only using SCEV, we can also support accesses, where the pointer
calculation is more complex, e.g. for loops like
for(int i=0;i<n;i++) for(int j=1;j<25;j++) A[j*25+i] = B[n+i]
This cost function gives a speed up on a few benchmarks (and no noticeable
regressions) from a wide range of benchmarks from the test-suite, SPEC2000,
SPEC2006 and a range of proprietary suites on AArch64.