This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. This effort is aimed at simplifying
the existing vectorization algorithm so that we can introduce support for
'iter_args' (reductions) [1] and more advanced vectorization scenarios in the
future. More detailed information about the vectorization approach is included
in the patch. These are the most important changes introduced by the new
algorithm:
- Removed tracking of root and terminal ops. Existing vectorization functionality is preserved and extended so that loop nests without root-terminal chains can be vectorized.
- Vectorizing a loop nest now only requires a single topological traversal.
- A new vector loop nest is incrementally built along the vectorization process. The original scalar loop is kept intact. No cloning guard is needed to recover the scalar loop if vectorization fails. This approach also simplifies the challenging task of replacing a loop operation amid the vectorization process without invalidating the analysis information that depends on the original loop.
- Vectorization of specific operations has been implemented as independent, preparing them to be moved to a potential vectorization interface.
Initial support for 'iter_args' will be introduced in follow-up patches. I already
the first scenario [1] working locally.
It's very likely that you also find more improvement opportunities while looking
at the code since this pass is a bit "old". Feel free to bring them up. We can
address those that are strictly related to the new code in this review and follow
up with further changes. I would prefer to keep this patch as focused as possible
since it's already quite large.
Thanks!
Diego
[1] https://llvm.discourse.group/t/rfc-adding-reduction-support-iter-args-to-the-affine-vectorizer/2807.
is this layout picked by lint (or can we have return type on same line and break args earlier)?