Motivated by pr43326 (https://bugs.llvm.org/show_bug.cgi?id=43326). What is significant there is the loop is a triply-nested loop. And a slightly modified test case is as follows.
void f(int e[10][10][10], int f[10][10][10]) { for (int a = 0; a < 10; a++) { for (int b = 0; b < 10; b++) { for (int c = 0; c < 10; c++) { f[c][b][a] = e[c][b][a]; } } } }
The ideal optimal access pattern after running interchange is supposed to be the following
void f(int e[10][10][10], int f[10][10][10]) { for (int c = 0; c < 10; c++) { for (int b = 0; b < 10; b++) { for (int a = 0; a < 10; a++) { f[c][b][a] = e[c][b][a]; } } } }
Currently we interchange two times, changing the loop order from a->b->c to a->c->b and c->a->b. In other words, we are limited to picking up the innermost loop and finding an order that is locally optimal for it. However, the pass failed to produce the optimal loop access order which is c->b->a. For more complex examples what we get would be somehow quite far from the globally optimal ordering.
What is proposed in this pattern is to do a "bubble-sort" fashion when doing interchange. By comparing neighbors in LoopList in each iteration, we would be able to move each loop onto a most appropriate place, hence this is an approach that tries to achieve the globally optimal ordering.
To be specific, for the motivating example, the original order is
a->b->c
In the first iteration, we first interchange b and c, which gives us
a->c->b,
then we interchange a and c, which gives us
c->a->b.
Then in the second iteration, we interchange a and b, which gives us
c->b->a,
which is the ideal ordering.
This approach is supposed to work for however complex cases as well. In fact for the real test case in pr43326, the loop has reductions and currently loop interchange is not able to handle multi-level reduction loops (more than 2 levels). I currently limit this patch to deal only with simple multi-level loops, and I would be willing to post follow-up patches that enable interchange with multi-level reductions.
Comments on this patch are appreciated :)
If you don't assume the vector is empty when starting (assert(LoopList.empty())?), consider clearing the list at the start of the function.
Change seems NFC anyway.