We have several bug reports that could be characterized as "reducing scalarization", and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector pass that runs before/after other vectorization passes.
There are 4 alternate options that I can think of to deal with this kind of problem (and we've seen various attempts at all of these), but they all have flaws:
- InstCombine - can't happen without TTI, but we don't want target-specific folds there.
- SDAG - too late to assist other vectorization passes; TLI is not equipped for these kind of cost queries; limited to a single basic block.
- CGP - too late to assist other vectorization passes; would need to re-implement basic cleanups like CSE/instcombine.
- SLP - doesn't fit with existing transforms; limited to a single basic block.
This initial patch/transform is based on existing code in AggressiveInstCombine: we walk backwards through the function looking for a pattern match. But we diverge from that cost-independent IR canonicalization pass by using TTI to decide if the vector alternative is profitable.
We probably have at least 10 similar bug reports/patterns (binops, constants, inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements. It's possible that we could iterate on a worklist to fix-point like InstCombine does, but it's safer to start with a most basic case and evolve from there, so I didn't try to do anything fancy here.
Is this counter used?