Loop distribute bails out early if a loop is already vectorizable. As a
first attempt to make the LoopDistribute pass more generally
useful (with the eventual aim of enabling loop distribute by default at
-O3), this patch removes that restriction.
Originally, this pass tries to separate the vectorizable parts of a loop
from its non-vectorizable parts, such that some of the resulting loops
can be vectorized. Loop distribution could be more generally useful, for
example, by improving cache locality of accesses in each loop.
With this change, all vectorizable load/stores end up in individual
partitions, only to be merged back together. With
--loop-distribute-merge-vectorizable-partitions=false however, the pass
distributes as much as possible, allowing us to start iterating on the
cost model.
To prevent removeUnusedInsts() from creating undefs outside of the loop,
replace any uses of seed instructions. For each value used outside of
the loop there is exactly one partition that uses that instruction as a
seed, thanks to findDefsUsedOutsideOfLoop(). This guarantees that all
uses outside of the loop are mapped to the correct partition.
This change, together with
--loop-distribute-merge-vectorizable-partitions=false (and
--enable-loop-distribute), distributes many more loops in the LLVM test
suite, with very mixed performance results.
Follow-up patches will work on a cost model to improve the performance
impact of the pass.
clang-format not found in user's PATH; not linting file.