This patch proposed to replace the current loop interchange cost model by a new one, i.e., one that is returned from loop cache analysis.
Motivation behind
Given a loopnest, what loop cache analysis returns is a vector of loops [loop0, loop1, loop2, ...] where loop0 should be replaced as the outermost loop, loop1 should be placed one more level inside, and loop2 one more level inside, etc. What loop cache analysis does is not only more comprehensive than the current cost model, it is also a "one-shot" query meaning that we only need to query once during the entire loop interchange pass, which is better than the current cost model where we query it every time we check whether it is profitable to interchange two loops. Thus there is complexity saving, especially after D120386 where we do more interchanges to get the globally optimal loop access pattern.
Changes made to test cases
There's some changes to lit tests. Most of them are minor. One change that applies to all tests is that I changed the target triple from "x86_64" to "aarch64" and added "-mcpu=tsv110" in the RUN lines. This is because loop cache analysis needs the cache line size, which is given by "TTI.getCacheLineSize()". However, for x86 subtargets the "getCacheLineSize()" method is not implemented hence "TTI.getCacheLineSize()" would just return 0. This change I made makes "TTI.getCacheLineSize()" return a valid number and hence loop cache analysis can proceed as normal.
*update:* now the target triple are changed to powerpc as per comments.
interchange-no-deps.ll: removed the test function "no_bad_order()" because it is in fact irrelevant to situations for loop interchange and only aims at testing the legacy cost model (operands in gep instructions, to be more specific). The memory access is irrelevant to the outer loop and the outer loop should have been deleted, hence the function is not applicable to loop interchange. Therefore this is not a typical IR that we would encounter in real situations. The new and legacy cost models give different results for this function, so I just removed this test function since it is in fact not applicable to loop interchange.
interchanged-loop-nest-3.ll, not-interchanged-loop-nest-3.ll, not-interchanged-tightly-nested.ll: the IR was not really entirely correct, since the target triple specified was "x86_64" but the getelementptr indices are 32 bits. The indices should be 64 bits since pointer arithmetric should be 64 bit. So I changed them from i32 to i64, otherwise it will trigger SCEV assertion failures when running loop cache analysis, which says "scev operand types mismatch".
A note: currently we did not completely remove the legacy cost model, but keep it under an opt flag. This is because currently if we only used the new cost model, some lit tests would fail. The reason is that we leverage delinearization in loop cache analysis , and delinearization needs some enhancement -- currently it cannot successfully delinearize some tests and loop cache analysis would just bail out. I'll put enhancement of delinearization into my next-steps.
Not sure if we need an option, I slightly prefer removing it, because the fact that we have an "EnableLegacy" option for it that defaults to "true" might incorrectly imply that we still use the old cost model where in fact we use the new one for majority of cases.