Hello!
We run some experiments with LoopFusion pass and found an issue when fusing two loop nests with zero-trip-count check guards. We thought it would be interesting to contribute a fix, assuming it makes sense!
The problem is that after fusion, an unreachable block (FC0.ExitBlock) is not removed from LI. This makes LI verification fail when the unreachable block is inside a loop since unreachable blocks are not allowed to be part of a loop.
This issue happens depending on the number of loops per nest and the form of the input loops. This is what we have found so far:
- For single loop nests, the unreachable block (FC0.ExitBlock) is not nested in any other loop so LI verification doesn't complain.
- For double loop nests in canonical form, the unreachable block resulting from fusing the inner loops is still nested in the outer loop so LI verification fails.
- For non-canonical double loop nests, no unreachable block seems to be generated, probably because loops don't have dedicated exists.
- For loop nests with more than two loops, unreachable blocks are generated regardless of whether they are in canonical or non-canonical form so we have the same problem as in #2.
The fix basically removes FC0.ExitBlock from LI after fusion. We also decided to remove FC1GuardBlock from LI and DT to keep them consistent until FC1 loop is removed later on in the code, in case those analyses are used in the future before that point. Hopefully we are not missing anything else.
The patch also includes a couple of tests with double and a triple loop nest cases. They were passed through regular optimizations, including jump threading and CFG simplifications and loops are in canonical form. They don't include CHECK rules for now. I plan to add them once I have a confirmation that this patch makes sense :). For experimentation, the double loop nest can be turned into non-canonical form by running SimplifyCFG on it.
On a side note, I wonder if we should only fuse loops in canonical form and bail out in other cases. That would reduce the number of loop "flavors" to be supported. A check for that could be added to the legality phase.
Thanks!
Diego