When optimizing for size, a loop is vectorized only if the resulting vector loop completely replaces the original scalar loop. This holds if no runtime guards are needed, if the original trip-count TC does not overflow, if TC is a known constant and if TC is a multiple of the VF. Targets with efficient vector masking can thereby overcome the last three TC-related conditions: see “Direction #1” in [[ http://lists.llvm.org/pipermail/llvm-dev/2018-August/125042.html | [llvm-dev] Vectorizing remainder loop ]] - this patch applies that transformation of setting the trip-count of the vector loop to be TC rounded-up to a multiple of VF while masking the vector body under a newly introduced "if (i < TC)" condition; or rather "if (i <= TC-1)" to overcome the aforementioned overflow hazard.
The patch allows loops with arbitrary trip counts to be vectorized under -Os, subject to the existing cost model considerations. It also applies to loops with small trip counts (under -O2) which are currently handled as if under -Os.
Handling loops with reductions and live-outs are marked as TODOs for subsequent extensions.
I think it's best not to keep this state in the Legal. From the Legal perspective, being able to vectorize the whole loop body under the mask and the actual decision to do so are completely separate issues.
Since canFold...() is invoked by CostModel::computeMaxVF, we should be able to keep this state in the CostModel. After all, whether to bail out or continue under FoldTailByMasking is a cost model side of the state, after consulting the Legal.