If tail-folding of the scalar remainder loop is applied, the primary induction variable is splat to a vector and used by the masked load/store vector instructions, thus the IV does not remain scalar.
This is a minimal change extracted from D78353, and is a useful change/fix on its own. This triggers in function @example2 in test case test/Transforms/LoopVectorize/X86/small-size.ll. Please note that there are quite some changes in this file, but that file was partly auto-generated and partly hand-edited, and I've regenerated all expected output.
This is now called only for VF=1, so doesn't really create vectors by splatting a scalar IV, only takes care of bump it across UF parts. Can be further cleaned-up, as a follow-up patch.