Hi,
I am proposing a patch to prevent passes from destroying canonical loop structure, especially for nested loop. When eliminating or merging empty/small basic blocks, LLVM checks the existence of non-trivial PHI nodes to recognize potential loops and choose not to eliminate the block. However, the current PHI node based algorithm fails to recognize a loop if the loop’s exit condition is evaluated only with volatile values hence no PHI nodes in the loop header. Especially when such a loop is an outer loop of a nested loop, the loop is collapsed into a single loop which prevent later optimizations from being applied (e.g., transforming nested loops into simplified form in LoopSimplify and loop vectorization).
The patch augments the existing PHI node-based check by adding a pre-test if the BB actually belongs to a set of loop headers and not eliminating it if yes. Jump Threading already builds a set of loop headers by identifying backedges, so we can simply reuse the set. For SimplifyCFG, such a set of loop headers can be created per function in iterativelySimplifyCFG().
The following example loops are collapsed by JumpThreading or SimplifyCFG and the LLVM loop vectorizer cannot vectorize the resulting loop. With the patch, they maintain loop structure and get vectorized.
(1) Loop with a volatile iteration variable
void foo2(float * restrict x, float * restrict y, float * restrict z) {
for (volatile int i = 0; i < 1000; ++i) { for (int j = 0; j < 1600; ++j) { x[j] = y[j] + z[j]; } }
}
(2) While loop with volatile arguments for exit function
int done(float *x, float *y, float *z);
void foo3(float * restrict x, float * restrict y, float * restrict z) {
while (!done(x, y, z)) { for (int j = 0; j < 1600; ++j) { x[j] = y[j] + z[j]; } }
}
The patch fails to pass four tests for LoopUnswitch and SimplifyCFG. Two failed tests for LoopUnswitch are simply due to BB name differences and can be trivially resolved. Two failed tests for SimplifyCFG include two nested loops with an empty outer loop header which were supposed to be merged with the inner loop but not any more after the patch. I propose changes to these tests too. The empty blocks will get eliminated when SimplifyCFG is executed again later in the back-end. Therefore, extra branches in test IR's will not affect final codes.
Please update the comment to explain the new parameter. As is, it is not even clear whether this is an optional input or output.