This is the bugfix to the miscompile mentioned in https://reviews.llvm.org/D132055#3814831.
The IR that reproduced the bug is added as the test case in this patch. The problem is that this is a triply nested loop, and loop interchange has interchanged the outermost and the middle loop, which it should not do. This is because the innermost loop is doing reduction, and the reduction is performed with intrinsics and store instructions instead of phi nodes, i.e., there is not a reduction phi node in the middle loop header.
We should prevent interchange for such cases. Currently however, the pass does interchange because the findInductionAndReductions() function only checks phi nodes in InnerLoop and OuterLoop, which is the middle loop and the outermost loop in the problematic IR. The phi node in the innermost loop that corresponds to the reduction operation, i.e., %vec.phi = phi <4 x i32> [ %0, %for.cond4.preheader.i ], [ %16, %vector.body ], is not checked. We should check it and let findInductionAndReductions() return false since that phi node %vec.phi is not something that the pass can handle.
What this patch does is that, instead of checking the phi nodes only in InnerLoop and OuterLoop, we check all subloops of the OuterLoop. This way we not only check the phi nodes in the outermost and the middle loop, but also check the innermost loop as well. And we'll bail from findInductionAndReductions() when checking the innermost loop, thus not interchanging the middle and the outermost loop.
[nit] LLVM does not use Almost-Always-Auto,